Syllabus (2008)
From LS2
Contents |
Summary
This course will cover modern empirical methods in natural language processing. It is designed for language technologies students who want to understand statistical methodology in the language domain, and for machine learning and computer science students who want to know about current problems and solutions in text processing.
Students will, upon completion, understand how statistical modeling and learning can be applied to text, be able to develop and apply new statistical models for problems in their own research, and be able to critically read papers from the major related conferences (EMNLP and ∗ACL). A recurring theme will be the tradeoffs between computational cost, mathematical elegance, and applicability to real problems. The course will be organized around methods, with concrete tasks introduced throughout.
Target
The course is designed for SCS graduate students. Prerequisite: Language and Statistics (11-761) or permission of the instructor. Recommended: Algorithms for Natural Language Processing (11-711), Machine Learning (15-681, 15-781, or 11-746).
Evaluation
Students will be evaluated in four ways.
Literature review (30%)
Each student will complete a critical literature review on a pre-selected set of five recent (past two years) papers published in major NLP conference proceedings or journals. The papers should be on a single topic that is related to the course. The review is not intended to merely summarize the five papers, but rather to explain their contributions in context, referring as appropriate to earlier research on the problem. This exercise is intended to promote deeper understanding of one research problem in empirical NLP and to hone each student's research reading, writing, and critical analysis skills. It is acceptable for the papers to be in an area close to your research, but the papers cannot simply be five papers you've already read. You may not select papers written by you or your advisor.
Depending on enrollment, the literature review will be done individually or in groups of two.
To encourage effective written communication, there will be two rounds of evaluation. A draft will be due a few weeks before the final deadline, so that feedback from the instructor can improve the quality of the final document.
Oral presentation and discussion (20%)
Each student will give a ∼20-minute oral presentation on his/her literature review. (The presentation time may be changed depending on enrollment and time constraints.) A period of discussion will follow, in which we will aim to find connections between student topics. The driving questions will be: What can be borrowed from one area and applied to another? And what challenges are not being met by current methods?
Assignments (30%)
Small projects with a programming component will be assigned about every three weeks. Typically some or all of the software will be available, and students will be expected to run experiments or extend implementation. To encourage creative exploration, some projects may be graded competitively.
Final exam (20%)
A final written exam will be given to test basic competence with the technical material covered in the lectures.
Topics
An outline follows. Numbers in parentheses are the number of lectures. Starred items will be postponed or canceled if the schedule lags. Everything is subject to change.
- Background
- Philosophy: the empirical way of thinking about language (1)
- Evaluation, experimentation, and hypothesis testing (1)
- Numerical optimization crash course (1)
- Weighted dynamic programming (2)
- Supervised NLP
- Stochastic models of sequences: Markov models, hidden Markov models, and related algorithms (2)
- Log-linear/exponential/maximum entropy models, conditional estimation, CRFs, regularization (3)
- Stochastic and weighted context-free grammars, statistical parsing with CFGs, dependency parsing (3)
- Other discriminative methods for structured data: perceptron, boosting, maximizing the margin, online methods (2)
- Semi- and Unsupervised NLP
- EM and word clustering (2)
- EM for NL models, contrastive estimation (2)
- Variational Bayesian inference and learning for NL models (2)
- Nonparametric Bayesian methods (1)
- Combining labeled and unlabeled data: Yarowsky algorithms, self-training, co-training, generalized expectation (2)
- Partially labeled data: weighted finite-state machines and transducers, learning in machine translation (1)
The final lectures of the course will be devoted to the oral presentations and discussion.
Readings
Manning and Schütze’s Foundations of Statistical Natural Language Processing will be recommended for background reading during parts of the course, though many of the techniques taught are predated by that book. Readings will be suggested from recent conferences and journal articles, perhaps also chapters from Jurafsky & Martin’s Speech and Language Processing (second edition strongly recommended), MacKay’s Information Theory, Inference, and Learning Algorithms, Klavans and Resnik’s The Balancing Act, or other texts. No particular readings will be mandatory, though a great deal of reading will be required for completion of the literature review. See External resources for some useful links.
