Flexible Learning for Natural Language Processing

Statistical learning is now central to natural language processing (NLP). Bridging the gap between learning and linguistic representation requires going beyond learning parameters. This CAREER project addresses three challenging, unresolved questions: (1) Given recent advances in learning the parameters of linguistic models and in approximate inference, how can the process of feature design be automated? (2) Given that NLP tasks are often defined without recourse to real applications and that a specific annotated dataset is unlikely to fulfill the needs of multiple NLP projects, can learning frameworks be extended to perform automatic task refinement, simplifying a linguistic analysis task to obtain more consistent, more precise, or faster performance? (3) Can computational models of language take into account the non-text context in which our linguistic data are embedded? Building on recent success in social text analysis and text-driven forecasting, this CAREER project seeks to exploit context to refine models of linguistic structure while enabling advances in this application area.

Project Personnel

Noah Smith (PI), Carnegie Mellon University School of Computer Science
André Martins (former member), now at Priberam Labs
Brendan O'Connor (Ph.D. student), Carnegie Mellon University School of Computer Science
Tobi Owoputi (undergraduate), Carnegie Mellon University School of Computer Science
Yanchuan Sim (Ph.D. student), Carnegie Mellon University School of Computer Science
Tae Yano (Ph.D. student), Carnegie Mellon University School of Computer Science

Downloads

Publications

Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers. André F. T. Martins, Miguel Almeida, and Noah A. Smith. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria, August 2013.
Linguistic Structure Prediction with the Sparseptron. Noah A. Smith and André F. T. Martins. ACM Crossroads 19(3):44–48, April 2013.
Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters. Olutobi Owoputi, Brendan O'Connor, Chris Dyer, Kevin Gimpel, Nathan Schneider, and Noah A. Smith. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2013), Atlanta, GA, June 2013.
AD³: Alternating Directions Dual Decomposition for MAP Inference in Graphical Models. André F. T. Martins, Mário A. T. Figueiredo, Pedro M. Q. Aguiar, Noah A. Smith, and Eric P. Xing.
Mapping the Geographical Diffusion of New Words. Jacob Eisenstein, Brendan O'Connor, Noah A. Smith, and Eric P. Xing. In Proceedings of the NIPS Workshop on Social Network and Social Media Analysis: Methods, Models and Applications, Lake Tahoe, NV, December 2012.
Word Salad: Relating Food Prices and Descriptions. Victor Chahuneau, Kevin Gimpel, Bryan R. Routledge, Lily Scherlis, and Noah A. Smith. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and Natural Language Learning (EMNLP 2012), Jeju, Korea, July 2012. Also available: appendix.
An Exact Dual Decomposition Algorithm for Shallow Semantic Parsing with Constraints. Dipanjan Das, André F. T. Martins, and Noah A. Smith. In Proceedings of the Joint Conference on Lexical and Computational Semantics (*SEM 2012), Montréal, Québec, June 2012.
Structured Ramp Loss Minimization for Machine Translation. Kevin Gimpel and Noah A. Smith. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2012), Montréal, Québec, June 2012.
Censorship and Content Deletion in Chinese Social Media. David Bamman, Brendan O'Connor, and Noah A. Smith. First Monday 17(3), March 2012.
Computational Text Analysis for Social Science: Model Complexity and Assumptions. Brendan O'Connor, David Bamman, and Noah A. Smith. In Proceedings of the NIPS Workshop on Comptuational Social Science and the Wisdom of Crowds, Sierra Nevada, Spain, December 2011.
Structured Sparsity in Structured Prediction. André F. T. Martins, Noah A. Smith, Pedro M. Q. Aguiar, and Mário A. T. Figueiredo. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2011), Edinburgh, UK, July 2011.
Dual Decomposition with Many Overlapping Components. André F. T. Martins, Noah A. Smith, Pedro M. Q. Aguiar, and Mário A. T. Figueiredo. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2011), Edinburgh, UK, July 2011.
An Augmented Lagrangian Approach to Constrained MAP Inference. André F. T. Martins, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Noah A. Smith, and Eric P. Xing. In Proceedings of the International Conference on Machine Learning (ICML 2011), Bellevue, WA, June/July 2011.
Author Age Prediction from Text using Linear Regression. Dong Nguyen, Noah A. Smith, and Carolyn P. Rosé. In Proceedings of the ACL Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LATECH 2011), Portland, OR, June 2011.
Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments. Kevin Gimpel, Nathan Schneider, Brendan O'Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, companion volume (ACL 2011), Portland, OR, June 2011.
Discovering Sociolinguistic Associations with Structured Sparsity. Jacob Eisenstein, Noah A. Smith, and Eric P. Xing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2011), Portland, OR, June 2011.
Online Learning of Structured Predictors with Multiple Kernels. André F. T. Martins, Noah A. Smith, Eric P. Xing, Pedro M. Q. Aguiar, and Mário A. T. Figueiredo. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS 2011), Fort Lauderdale, FL, April 2011.

Acknowledgments

This project is supported by the National Science Foundation (IIS-1054319).