TBSD: Using TurboParser to Get Stanford Dependencies
Background
Stanford typed dependencies are a widely desired representation of
natural language sentences,
but parsing is one of the major computational bottlenecks in text
analysis systems. In light of the evolving definition of the Stanford
dependencies and developments in statistical dependency parsing
algorithms, this paper revisits the question of
Cer et al. (2010):
what is the tradeoff between accuracy and
speed in obtaining Stanford dependencies in particular? We also explore the
effects of input representations on this tradeoff: part-of-speech tags, the novel
use of an alternative dependency representation as input, and
distributional representaions of words. We find that direct
dependency parsing is a more viable solution than it was found to be
in the past.
Further Reading
The main technical ideas behind this software appear in the paper:
Download
Pre-trained Models
For Stanford Dependencies v3.3.0, we trained three models (to generate Stanford basic dependencies) full, standard, basic. They require TurboParser v. 2.1.0.
Software Package
We provide the code for our additional inference rules and scripts to get Stanford Dependencies from raw input files conveniently here.
Pre-trained Tagging Models
In the script we provide, we use TurboTagger to perform the POS tagging. Here we offer a tagging model trained on the sections 02-21 of the Penn Treebank.
For questions, bug fixes and comments, please e-mail lingpenk [strudel] cs.cmu.edu.