Noah's ARK logo:  various visual puns and oblique technical references in the official CMU color.

ARK researchers. Picture by Mattt Thompson.

Noah's ARK[1] is Noah Smith's informal research group at the Language Technologies Institute, School of Computer Science, Carnegie Mellon University. (The research is formal; the group is informal.) As you may have guessed, our research focuses on problems of ambiguity and uncertainty in natural language processing, including morphology, syntax, semantics, translation, and behavioral/social phenomena observed through language.

[1]The acronym is ambiguous; possible interpretations might include Ambiguity Research Kith or Ambiguity Resolution K. or A. R. Kibbutz. With apologies to the Bible and DAGS.

ARK Research in News and Blogs

August 2009: André, Noah, and Eric received a best paper award at the 2009 Annual Meeting of the Association for Computational Linguistics.

April 2009: IBM announces a Jeopardy!-playing computer program. We saw that coming two years ago!

December 2008: Shay, Rob, and Noah received the best student paper award at the 2008 International Conference on Logic Programming.

November 2008: Shimon, Bryan, Jacob, and Noah received a research award from the Q-Group (Institute for Quantitative Research in Finanace) to study text-based portfolio choice.

May 2008: Little Green Footballs, a political blog, happened on some data Tae had on her website, prompting fascinating (to us) speculation about what we were up to. See Noah and William's response here.

January 2008: The New Scientist and Tech Digest blogs commented on Danny and Noah's relative keyboard.

November 2007: Yahoo granted us access to M45, a 4,000-processor supercomputer.

Demos

Resources and Tools

Projects

Researchers

(This list includes only ARK researchers without missing data!)
Outdated PhotoNamePositionTopicsLanguages (Spoken and/or Written) Languages (Researched)Languages (Hacked In)Favorite Term of Venery
Shay Cohen Ph.D. student, LTImorphosyntactic parsing, unsupervised parsing, parsing algorithmsHebrew, EnglishArabic, Chinese, English, German, HebrewC++, C, Perl, Java, Matlabshoal
Dipanjan Das Ph.D. student, LTI sentence-sentence semantic relationships, parsing (RAVINE) English, Bengali, Hindi, Sanskrit (swalpam) English, BengaliJava, C, C++, Perlbloat
Kevin Gimpel Ph.D. student, LTIstatistical NLP and translation (INCA) English, German, ItalianEnglish, Portuguese, German, Chinese, Arabic C/C++, Perl, Java, Matlabcrash
Michael HeilmanPh.D. student, LTIquestion-answer modeling (RAVINE)English, Japanese (sukoshi)English, English as a second languageJava, Perl, PHP, Ruby, C++mob
André Martins Ph.D. student, LTI and Universidade Técnica de Lisboastructured and kernel machine learning, parsingPortuguese, English, Spanish, German (ein bisschen), French (un petit peu)Portuguese, English, SpanishC, C++, Matlabscurry
Behrang MohitPost-doc, CMU-QArabic NLP
Brendan O'ConnorM.S. student, LTItext analysis and social scienceEnglish, German, Chinese (一点点)EnglishR, Awk, etc.prickle
Nate Schneider Ph.D. student, LTIparsing, grammar learning, semantics; cognitive linguistics (RAVINE) English, Hebrew (קצת), Arabic (قليل), French (un peu)English, HebrewPython, Java, PHP, Scheme, C#, C++smack
Tae Yano Ph.D. student, LTINLP in the political domain, rich models of structured NL data (e.g., blogs)Japanese, English, Spanish, FrenchEnglishC, C++, Java, Perl, Pythonhusk
Noah SmithAssistant Professor, LTI & MLD(all of the above)English, French (un peu) Arabic, Bulgarian, Czech, English, French, German, Hebrew, Korean, Mandarin, Portuguese, Turkish LaTeXparade

Alumni

Acknowledgments

Our research has been/is supported in part by the DARPA Computer Science Study Panel program (grant numbers HR-00110110013 and NBCH-1080004), the National Science Foundation (grant numbers IIS-0713265, IIS-0836431, IIS-0844507, IIS-0915187, and a graduate fellowship to Michael Heilman), the Qatar National Research Foundation (grant number NPRP 08-485-1-083), a gift from the Berkman Faculty Development Fund at CMU, an IBM faculty award, a Q-Group award, a grant from Google, and computational resources provided by Yahoo.

Locations of visitors to this page