Noah's ARK logo:  various visual puns and oblique technical references in the official CMU color.

ARK researchers in April 2014. Courtesy of Greg Hanneman.

Noah's ARK[1] is Noah Smith's informal research group at the Language Technologies Institute, School of Computer Science, Carnegie Mellon University. (The research is formal; the group is informal.) As you may have guessed, our research focuses on problems of ambiguity and uncertainty in natural language processing, including morphology, syntax, semantics, translation, and behavioral/social phenomena observed through language—all viewed through a computational lens.

[1]The acronym is ambiguous; possible interpretations might include Ambiguity Research Kith or Ambiguity Resolution K. or A. R. Kibbutz. With apologies to the Bible and DAGS.

ARK Research: Awards, News Coverage


Resources and Tools

ARK researchers in April 2009. Picture by Mattt Thompson.
The following were developed by ARK researchers (*developed in whole or in part before joining ARK):


Outdated PhotoNamePositionTopicsLanguages (Spoken and/or Written) Languages (Researched) Languages (Hacked In)Favorite Term of Venery
Waleed Ammar Ph.D. student, LTI statistical machine translation, text analytics Arabic, English English, Arabic, Hebrew, Kinyarwanda C#, C/C++, Javascript, ruby, Java, PHP, charm
David Bamman Ph.D. student, LTIsociolinguistic variation; statistical NLP for computational social science and the humanitiesEnglish, Latin, Ancient Greek, Italian (un po), French (un peu), German (ein bisschen), Mandarin Chinese (一点儿)English, Latin, Ancient Greek, ChineseJava, Python, Perlcoalition
Dallas CardPh.D. student, MLDmachine learning and statistical NLP for the social sciences and humanitiesEnglishEnglishPython, Java, R, C#, C/C++siege
Jesse DodgeM.S. student, LTI semantic parsing and general text understanding, machine learning English, French (un peu) English Java, Python, R dazzle
Chris Dyer Assistant Professor, LTI & MLD machine translation, unsupervised learning, text-based forecasting, big data English, German Arabic, Chinese, Czech, Dutch, English, French, German, Hungarian, Telugu, Turkish, Urdu, Welsh C++, Perl, Java conspiracy
Jeffrey FlaniganPh.D. student, LTIsemantic parsing, machine translationEnglish, French Arabic, Chinese, English, Kinyarwanda, Urdu, Malagasy, SwahiliScala, C/C++, Awk, Python, Javamurmuration
Lingpeng KongM.S. student, LTIparsing, text analysisChinese, EnglishEnglish, ChineseJava, Pythonclutter
Fei LiuPostdoctoral fellow, ISR/LTIsummarization, social media, NLP, machine learning Chinese, EnglishEnglish, Chinese Java, Python, C++, Matlabbevy
Rohan RamanathM.S. student, LTImachine learning, text analysis, NLP, crowdsourcingEnglish, Tamil, Hindi, KannadaEnglishPython/Django, C/C++, Javascript, Java, C#, HTML/CSSambush
Bryan RoutledgeAssociate Professor of finance, TepperFinance, asset pricingEnglish, CanadianN/AMatlab, R, Stata, Perl, Excel, Cobol pod
Yanchuan Sim Ph.D. student, LTIBayesian graphical modeling, text miningEnglish, Chinese (Mandarin, Teochew, Cantonese, Hokkien)EnglishC/C++, Java, Pythonexaltation
Swabha SwayamdiptaPh.D. student, LTIstructured prediction, machine learningEnglish, Hindi, OriyaEnglishPython, Java, C/C++, Matlabwaddling
Sam ThomsonPh.D. student, LTIsemantic parsingEnglish, Spanish English Python, Java, JavaScript, R murder
Dani YogatamaPh.D. student, LTImachine learning for NLP Indonesian, Japanese, English English, Japanese, French, Spanish C/C++, Java, Python, Matlab, R band
Noah SmithAssociate Professor, LTI & MLD(most of the above)English, French (un peu) Arabic, Bulgarian, Czech, English, French, German, Hebrew, Korean, Mandarin, Portuguese, Turkish LaTeXparade


(If you're one of these people and the information below is not current, it's time to get in touch!)


Our research has been/is supported in part by the DARPA (through the Computer Science Study Panel program, grant numbers HR-00110110013, NBCH-1080004, and N10AP20042; and the DEFT program), IARPA (grant number N10PC20222 and the OSI program), the National Science Foundation (grant numbers IIS-0713265, IIS-0836431, IIS-0844507, IIS-0915187, CAREER IIS-1054319, IIS-1211277, IIS-1251131, and a graduate fellowship to Michael Heilman), the Army Research Office (grant number W911NF-10-1-0533), the Qatar National Research Foundation (grant number NPRP 08-485-1-083), Sandia National Laboratories (a graduate fellowship to Kevin Gimpel), the Sloan Foundation, the Information and Communication Technologies Institute (a graduate fellowship to André Martins), the Singapore Agency for Science, Technology, and Research (a graduate fellowship to Yanchuan Sim), a gift from the Berkman Faculty Development Fund at CMU, an IBM faculty award, a Q-Group award, grants from Google, and computational resources provided by Yahoo, the Pittsburgh Supercomputing Center, and Amazon Web Services.

Locations of visitors to this page