Noah's ARK logo:  various visual puns and oblique technical references in the official CMU color.

ARK researchers in October 2012.

Noah's ARK[1] is Noah Smith's informal research group at the Language Technologies Institute, School of Computer Science, Carnegie Mellon University. (The research is formal; the group is informal.) As you may have guessed, our research focuses on problems of ambiguity and uncertainty in natural language processing, including morphology, syntax, semantics, translation, and behavioral/social phenomena observed through language—all viewed through a computational lens.

[1]The acronym is ambiguous; possible interpretations might include Ambiguity Research Kith or Ambiguity Resolution K. or A. R. Kibbutz. With apologies to the Bible and DAGS.

ARK Research: Awards, News Coverage

Demos

Resources and Tools

ARK researchers in April 2009. Picture by Mattt Thompson.
The following were developed by ARK researchers (*developed in whole or in part before joining ARK):

Researchers

Outdated PhotoNamePositionTopicsLanguages (Spoken and/or Written) Languages (Researched) Languages (Hacked In)Favorite Term of Venery
Waleed Ammar Ph.D. student, LTI statistical machine translation, text analytics Arabic, English English, Arabic, Hebrew, Kinyarwanda C#, C/C++, Javascript, ruby, Java, PHP, ASP.net charm
David Bamman Ph.D. student, LTIsociolinguistic variation; statistical NLP for computational social science and the humanitiesEnglish, Latin, Ancient Greek, Italian (un po), French (un peu), German (ein bisschen), Mandarin Chinese (一点儿)English, Latin, Ancient Greek, ChineseJava, Python, Perlcoalition
Victor Chahuneau M.S. student, LTI
Chris Dyer Assistant Professor, LTI & MLD machine translation, unsupervised learning, text-based forecasting, big data English, German Arabic, Chinese, Czech, Dutch, English, French, German, Hungarian, Telugu, Turkish, Urdu, Welsh C++, Perl, Java conspiracy
Behrang MohitPost-doc, CMU-Q Arabic NLP, machine translation, semanticsEnglish, Persian (Farsi), Arabic English, Arabic Java, Python
Brendan O'ConnorPh.D. student, MLDtext analysis and social scienceEnglish, GermanEnglish, ChineseR, Awk, etc.prickle
Bryan RoutledgeAssociate Professor of finance, TepperFinance, asset pricingEnglish, CanadianN/AMatlab, R, Stata, Perl, Excel, Cobol pod
Nathan Schneider Ph.D. student, LTIsemantics and its relation to linguistic structure; cognitive linguisticsEnglish, Hebrew (קצת), Arabic (قليل), French (un peu), German (ein bisschen)English, Hebrew, ArabicPython, Java, PHP, Scheme, Javascriptsmack
Yanchuan Sim Ph.D. student, LTIBayesian graphical modeling, text miningEnglish, Chinese (Mandarin, Teochew, Cantonese, Hokkien)EnglishC/C++, Java, Pythonexaltation
Sam ThomsonM.S. student, LTIsemantic parsingEnglish, Spanish English Python, Java, JavaScript, R murder
Tae Yano Ph.D. student, LTINLP in the political domain, rich models of structured NL data (e.g., blogs)Japanese, English, Spanish, FrenchEnglishC, C++, Java, Perl, Pythonhusk
Dani YogatamaPh.D. student, LTItext-driven forecasting Indonesian, Japanese, English English, Japanese, French, Spanish C/C++, Java, Python, Matlab band
Noah SmithAssociate Professor, LTI & MLD(most of the above)English, French (un peu) Arabic, Bulgarian, Czech, English, French, German, Hebrew, Korean, Mandarin, Portuguese, Turkish LaTeXparade

Alumni

(If you're one of these people and the information below is not current, it's time to get in touch!)

Acknowledgments

Our research has been/is supported in part by the DARPA Computer Science Study Panel program (grant numbers HR-00110110013, NBCH-1080004, and N10AP20042), IARPA (grant number N10PC20222), the National Science Foundation (grant numbers IIS-0713265, IIS-0836431, IIS-0844507, IIS-0915187, CAREER IIS-1054319, and a graduate fellowship to Michael Heilman), the Army Research Office (grant number W911NF-10-1-0533), the Qatar National Research Foundation (grant number NPRP 08-485-1-083), Sandia National Laboratories (a graduate fellowship to Kevin Gimpel), the Information and Communication Technologies Institute (a graduate fellowship to André Martins), the Singapore Agency for Science, Technology, and Research (a graduate fellowship to Yanchuan Sim), a gift from the Berkman Faculty Development Fund at CMU, an IBM faculty award, a Q-Group award, grants from Google, and computational resources provided by Yahoo, the Pittsburgh Supercomputing Center, and Amazon Web Services.

Locations of visitors to this page