Lexical Semantics Resources for English

This consists of resources for computational analysis of English lexical semantics, including comprehensive multiword expression annotations for a 55,000-token corpus of web reviews, and a tool (trained on the corpus) that identifies multiword expressions in context. These were developed by Nathan Schneider, Noah Smith, and others at Carnegie Mellon University.


Further Reading

Please cite the following if you write any papers involving the use of the data above: A system trained on this corpus is described in:


This research was supported in part by NSF CAREER grant IIS-1054319, Google through the Reading is Believing project at CMU, and DARPA grant FA8750-12-2-0342 funded under the DEFT program.


Please e-mail nschneid [strudel] cs.cmu.edu with questions.