Twitter Part-of-Speech Tagging

This page provides a link to a dataset of tweets manually annotated with part-of-speech tags, a part-of-speech tagger trained on this data, and a simple browser-based POS tagging annotation interface. These data and tools were prepared by Kevin Gimpel, Nathan Schneider, Brendan O'Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah Smith.

Download

Update: 11/8/2011: Major speed improvement to tagger! Get latest version from GitHub.

To receive announcements about updates, join the ARK-tools mailing list.

Further Reading

Please cite this paper if you write any papers involving the use of the data above:

Acknowledgments

This research was supported in part by: the NSF through CAREER grant IIS-1054319, the U. S. Army Research Laboratory and the U. S. Army Research Office under contract/grant number W911NF-10-1-0533, Sandia National Laboratories (fellowship to K. Gimpel), and the U. S. Department of Education under IES grant R305B040063 (fellowship to M. Heilman).