Lexical Semantics Resources for English
This page provides resources for computational analysis of English lexical semantics,
- annotations of comprehensive multiword expressions and noun and verb supersenses for a 55,000-token corpus of web reviews,
- a tool (trained on the corpus) that identifies multiword expressions and supersenses in context.
These were developed by
, and others at
Carnegie Mellon University
- The Comprehensive Multiword Expressions corpus: CMWE 1.0 (README.md, LICENSE)
was described in the LREC paper and used for the TACL paper. It is superseded by STREUSLE 2.0.
- The version of the multiword expression identification system used in the TACL paper: AMALGr 1.0
The multiword expression annotations are described in:
The original MWE identification system is described in:
The supersense annotations and the combined MWE+supersense tagger are described in:
Additional details may be found in Nathan Schneider's dissertation.
This research was supported in part by NSF CAREER grant IIS-1054319, Google through the Reading is Believing project at CMU, and DARPA grant FA8750-12-2-0342 funded under the DEFT program.
We are grateful to Google and LDC for permission to redistribute their data along with our annotations.
Please e-mail nschneid [strudel] cs.cmu.edu with questions.