AQMAR Arabic Wikipedia Named Entity Corpus & Tagger

These resources were developed by Behrang Mohit, Nathan Schneider, Rishav Bhowmick, Kemal Oflazer, and Noah Smith as part of the AQMAR project.

Corpus

This is a 74,000-token corpus of 28 Arabic Wikipedia articles hand-annotated for named entities.

Tagger

This is a tagger for Arabic text, implemented in Java. It includes a pretrained named entity model.

Further Reading

Please cite the following if you write any papers involving the use of the data above:

Acknowledgments

This research was supported by Qatar National Research Fund grant NPRP 08-485-1-083.

Contact

Please e-mail behrang [strudel] cmu.edu or nschneid [strudel] cs.cmu.edu with questions.