Noah's ARK logo:  various visual puns and oblique technical references in the official CMU color.

ARK researchers in October 2012.

Noah's ARK[1] is Noah Smith's informal research group at the Language Technologies Institute, School of Computer Science, Carnegie Mellon University. (The research is formal; the group is informal.) As you may have guessed, our research focuses on problems of ambiguity and uncertainty in natural language processing, including morphology, syntax, semantics, translation, and behavioral/social phenomena observed through language—all viewed through a computational lens.

[1]The acronym is ambiguous; possible interpretations might include Ambiguity Research Kith or Ambiguity Resolution K. or A. R. Kibbutz. With apologies to the Bible and DAGS.

ARK Research: Awards, News Coverage

Demos

Resources and Tools

ARK researchers in April 2009. Picture by Mattt Thompson.
The following were developed by ARK researchers (*developed in whole or in part before joining ARK):

Researchers

Outdated PhotoNamePositionTopicsLanguages (Spoken and/or Written) Languages (Researched) Languages (Hacked In)Favorite Term of Venery
Waleed Ammar Ph.D. student, LTI statistical machine translation, text analytics Arabic, English English, Arabic, Hebrew, Kinyarwanda C#, C/C++, Javascript, ruby, Java, PHP, ASP.net charm
David Bamman Ph.D. student, LTIsociolinguistic variation; statistical NLP for computational social science and the humanitiesEnglish, Latin, Ancient Greek, Italian (un po), French (un peu), German (ein bisschen), Mandarin Chinese (一点儿)English, Latin, Ancient Greek, ChineseJava, Python, Perlcoalition
Dallas CardPh.D. student, MLDmachine learning and statistical NLP for the social sciences and humanitiesEnglishEnglishPython, Java, R, C#, C/C++siege
Jesse DodgeM.S. student, LTI semantic parsing and general text understanding, machine learning English, French (un peu) English Java, Python, R dazzle
Chris Dyer Assistant Professor, LTI & MLD machine translation, unsupervised learning, text-based forecasting, big data English, German Arabic, Chinese, Czech, Dutch, English, French, German, Hungarian, Telugu, Turkish, Urdu, Welsh C++, Perl, Java conspiracy
Jeffrey FlaniganPh.D. student, LTIsemantic parsing, machine translationEnglish, French Arabic, Chinese, English, Kinyarwanda, Urdu, Malagasy, SwahiliScala, C/C++, Awk, Python, Javamurmuration
Lingpeng KongM.S. student, LTIparsing, text analysisChinese, EnglishEnglish, ChineseJava, Pythonclutter
Fei LiuPostdoctoral fellow, ISR/LTIsummarization, social media, NLP, machine learning Chinese, EnglishEnglish, Chinese Java, Python, C++, Matlabbevy
Bill McDowellresearch programmer, LTIsemantics, machine learning, computational approaches to argumentationEnglishEnglishJava, C#, Javascript, ...unkindness
Brendan O'ConnorPh.D. student, MLDtext analysis and social scienceEnglish, GermanEnglish, ChineseR, Awk, etc.prickle
Rohan RamanathM.S. student, LTImachine learning, text analysis, NLP, crowdsourcingEnglish, Tamil, Hindi, KannadaEnglishPython/Django, C/C++, Javascript, Java, C#, HTML/CSSambush
Bryan RoutledgeAssociate Professor of finance, TepperFinance, asset pricingEnglish, CanadianN/AMatlab, R, Stata, Perl, Excel, Cobol pod
Nathan Schneider Ph.D. student, LTIsemantics and its relation to linguistic structure; cognitive linguisticsEnglish, Hebrew (קצת), Arabic (قليل), French (un peu), German (ein bisschen)English, Hebrew, ArabicPython, Java, PHP, Scheme, Javascriptsmack
Yanchuan Sim Ph.D. student, LTIBayesian graphical modeling, text miningEnglish, Chinese (Mandarin, Teochew, Cantonese, Hokkien)EnglishC/C++, Java, Pythonexaltation
Swabha SwayamdiptaPh.D. student, LTIstructured prediction, machine learningEnglish, Hindi, OriyaEnglishPython, Java, C/C++, Matlabwaddling
Sam ThomsonM.S. student, LTIsemantic parsingEnglish, Spanish English Python, Java, JavaScript, R murder
Dani YogatamaPh.D. student, LTItext-driven forecasting Indonesian, Japanese, English English, Japanese, French, Spanish C/C++, Java, Python, Matlab band
Noah SmithAssociate Professor, LTI & MLD(most of the above)English, French (un peu) Arabic, Bulgarian, Czech, English, French, German, Hebrew, Korean, Mandarin, Portuguese, Turkish LaTeXparade

Alumni

(If you're one of these people and the information below is not current, it's time to get in touch!)

Acknowledgments

Our research has been/is supported in part by the DARPA (through the Computer Science Study Panel program, grant numbers HR-00110110013, NBCH-1080004, and N10AP20042; and the DEFT program), IARPA (grant number N10PC20222 and the OSI program), the National Science Foundation (grant numbers IIS-0713265, IIS-0836431, IIS-0844507, IIS-0915187, CAREER IIS-1054319, IIS-1211277, IIS-1251131, and a graduate fellowship to Michael Heilman), the Army Research Office (grant number W911NF-10-1-0533), the Qatar National Research Foundation (grant number NPRP 08-485-1-083), Sandia National Laboratories (a graduate fellowship to Kevin Gimpel), the Sloan Foundation, the Information and Communication Technologies Institute (a graduate fellowship to André Martins), the Singapore Agency for Science, Technology, and Research (a graduate fellowship to Yanchuan Sim), a gift from the Berkman Faculty Development Fund at CMU, an IBM faculty award, a Q-Group award, grants from Google, and computational resources provided by Yahoo, the Pittsburgh Supercomputing Center, and Amazon Web Services.

Locations of visitors to this page