It is a Java implementation of a syntactically rich, rule-based within-document coreference system very similar to (the syntactic components of) Haghighi and Klein (2009). It is useful as a starting point for incorporating coreference into larger information extraction and natural language processing systems. For example, by tweaking the gazetteers, customzing mention identification, turning the syntactic rules into log-linear features, etc., it can be made useful for a variety of applications.
It was developed by Brendan O'Connor and Michael Heilman, students in Noah Smith's ARK research group.
Enter a few sentences of text in the box below (or click on an example) and then click "Resolve." The system will take a few seconds per sentence to respond.
Download a release: arkref-20110321.tgz
We are working on a real tech report describing it, but in the meantime, a preliminary document is available with the code. Please first read:
Aria Haghighi and Dan Klein. Simple Coreference Resolution with Rich Syntactic and Semantic Features. EMNLP 2009.
Out of the box, it performs about as well as H&K's 2009 system on the development data set (we have not evaluated on the test dataset). Its F-score is slightly higher, and the precision/recall tradeoff is different. Note that there is no semantic compatibility subsystem ("+SEM-COMPAT") and that we use a supersense tagger (Ciaramita and Altun, EMNLP 2006) rather than a named entity recognizer.
It depends on having a phrase structure parser. We use the Stanford Parser and include it in the download. ARKref also makes heavy use of the Stanford Tregex library for implementation of syntactic rules.