Noah's ARK logo:  various visual puns and oblique technical references in the official CMU color.

ARK researchers in May 2011. Picture by Greg Hanneman.

Noah's ARK[1] is Noah Smith's informal research group at the Language Technologies Institute, School of Computer Science, Carnegie Mellon University. (The research is formal; the group is informal.) As you may have guessed, our research focuses on problems of ambiguity and uncertainty in natural language processing, including morphology, syntax, semantics, translation, and behavioral/social phenomena observed through language—all viewed through a computational lens.

[1]The acronym is ambiguous; possible interpretations might include Ambiguity Research Kith or Ambiguity Resolution K. or A. R. Kibbutz. With apologies to the Bible and DAGS.

ARK Research: Awards, News Coverage


ARK researchers in April 2009. Picture by Mattt Thompson.

Demos

Resources and Tools

The following were developed by ARK researchers (*developed in whole or in part before joining ARK):

Projects

This is a list of multi-PI sponsored projects; it is not an exhaustive list of ARK research activities! Past projects:

Researchers

Outdated PhotoNamePositionTopicsLanguages (Spoken and/or Written) Languages (Researched)Languages (Hacked In)Favorite Term of Venery
Waleed Ammar Ph.D. student, LTI statistical machine translation, text analytics Arabic, English English, Arabic, Hebrew C#, C/C++, Javascript, ruby, Java, PHP, ASP.net charm
David Bamman Ph.D. student, LTIsociolinguistic variation; statistical NLP for computational social science and the humanitiesEnglish, Latin, Ancient Greek, Italian (un po), French (un peu), German (ein bisschen), Mandarin Chinese (一点儿)English, Latin, Ancient Greek, ChineseJava, Python, Perlcoalition
Victor Chahuneau M.S. student, LTI
Dipanjan Das Ph.D. student, LTI semantic and syntactic parsing, sentence-sentence semantic relationships, semi-supervised learning English, Bengali, Hindi, Sanskrit (swalpam) English, BengaliJava, C, C++, Perlbloat
Chris Dyer Post-doc, LTI machine translation, unsupervised learning, text-based forecasting, big data English, German Arabic, Chinese, Czech, Dutch, English, French, German, Hungarian, Telugu, Turkish, Urdu, Welsh C++, Perl, Java conspiracy
Kevin Gimpel Ph.D. student, LTIstatistical NLP and translationEnglish, German, ItalianEnglish, Portuguese, German, Chinese, Arabic, Spanish, French, Urdu C/C++, Perl, Java, Matlabcrash
André Martins Ph.D. student, LTI and Universidade Técnica de Lisboastructured and kernel machine learning, parsingPortuguese, English, Spanish, German (ein bisschen), French (un petit peu)Portuguese, English, SpanishC, C++, Matlabscurry
Behrang MohitPost-doc, CMU-Q Arabic NLP, machine translation, semanticsEnglish, Persian (Farsi), Arabic English, Arabic Java, Python
Brendan O'ConnorPh.D. student, MLDtext analysis and social scienceEnglish, GermanEnglish, ChineseR, Awk, etc.prickle
Nathan Schneider Ph.D. student, LTIlinguistic structure discovery; semantics; cognitive linguisticsEnglish, Hebrew (קצת), Arabic (قليل), French (un peu), German (ein bisschen)English, HebrewPython, Java, PHP, Scheme, Javascriptsmack
Yanchuan Sim Ph.D. student, LTIBayesian graphical modeling, text miningEnglish, Chinese (Mandarin, Teochew, Cantonese, Hokkien)EnglishC/C++, Java, Pythonexaltation
Tae Yano Ph.D. student, LTINLP in the political domain, rich models of structured NL data (e.g., blogs)Japanese, English, Spanish, FrenchEnglishC, C++, Java, Perl, Pythonhusk
Dani YogatamaM.S. student, LTItext-driven forecasting Indonesian, Japanese, English English, Japanese, French, Spanish C/C++, Java, Python, Matlab band
Bryan RoutledgeAssociate Professor of Finance, TepperFinance, asset pricingEnglish, CanadianN/AMatlab, R, Stata, Perl, Excel, Cobol pod
Noah SmithAssociate Professor, LTI & MLD(all of the above)English, French (un peu) Arabic, Bulgarian, Czech, English, French, German, Hebrew, Korean, Mandarin, Portuguese, Turkish LaTeXparade

Alumni

(If you're one of these people and the information below is not current, it's time to get in touch!)

Acknowledgments

Our research has been/is supported in part by the DARPA Computer Science Study Panel program (grant numbers HR-00110110013, NBCH-1080004, and N10AP20042), IARPA (grant number N10PC20222), the National Science Foundation (grant numbers IIS-0713265, IIS-0836431, IIS-0844507, IIS-0915187, CAREER IIS-1054319, and a graduate fellowship to Michael Heilman), the Army Research Office (grant number W911NF-10-1-0533), the Qatar National Research Foundation (grant number NPRP 08-485-1-083), Sandia National Laboratories (a graduate fellowship to Kevin Gimpel), the Information and Communication Technologies Institute (a graduate fellowship to André Martins), the Singapore Agency for Science, Technology, and Research (a graduate fellowship to Yanchuan Sim), a gift from the Berkman Faculty Development Fund at CMU, an IBM faculty award, a Q-Group award, grants from Google, and computational resources provided by Yahoo, the Pittsburgh Supercomputing Center, and Amazon Web Services.

Locations of visitors to this page