Syllabus (2007)

From LS2

Jump to: navigation, search

Contents

Summary

This course will cover modern empirical methods in natural language processing. It is designed for language technologies students who want to understand statistical methodology in the language domain, and for machine learning students who want to know about current problems and solutions in text processing.

Students will, upon completion, understand how statistical modeling and learning can be applied to text, be able to develop and apply new statistical models for problems in their own research, and be able to critically read papers from the major related conferences (EMNLP and ∗ACL). A recurring theme will be the tradeoffs between computational cost, mathematical elegance, and applicability to real problems. The course will be organized around methods, with concrete tasks introduced throughout.

Target

The course is designed for SCS graduate students. Prerequisite: Language and Statistics (11-761) or permission of the instructor. Recommended: Algorithms for Natural Language Processing (11-711), Machine Learning (15-681, 15-781, or 11-746).


Evaluation

Students will be evaluated in four ways.

Literature review (30%)

Each student will individually or, with permission, pairwise complete a literature review on a problem within natural language processing that interests him/her. The literature review is expected to be comprehensive and include a problem definition, evaluation, a discussion of available datasets, and a thorough, coherent discussion of existing techniques. Insofar as possible, comparison should be given among different techniques. Current obstacles should be discussed, ideally with insights on tackling or avoiding them. Implementation is not required for this literature review. Topics from the past include: question answering, morphology induction and modeling, syntax-based machine translation, syntax-based language modeling, optimality theory, and Bayesian topic modeling. Suggested topics might include textual entailment, paraphrase, data-oriented models (DOP), coreference resolution, sentiment analysis, and summarization. Each student will take on a different topic. To encourage effective written communication, there will be two rounds of evaluation. A draft will be due a few weeks before the final deadline , so that feedback from the instructor can improve the quality of the final document.

Oral presentation and discussion (20%)

Each student will give a ∼20-minute oral presentation on his/her literature review. A period of discussion will follow, in which we will aim to find connections between student topics. The driving questions will be: What can be borrowed from one area and applied to another? And what challenges are not being met by current methods?

Assignments (30%)

Small projects with a programming component will be assigned about every three weeks. Typically some or all of the software will be available, and students will be expected to run experiments or extend implementation. To encourage creative exploration, some projects may be graded competitively.

Final exam (20%)

A final written exam will be given to test basic competence with the technical material covered in the lectures.

Topics

An outline follows. Numbers in parentheses are the number of lectures. Starred items will be postponed or canceled if the schedule lags.


  1. Philosophy: the empirical way of thinking about language. (1)
  2. Stochastic models for sequences: Markov models, hidden Markov models, and related algorithms. (2)
  3. Log-linear/exponential/maximum entropy models, conditional estimation, CRFs, regularization, and convex optimization. (3)
  4. Weighted finite-state machines and transducers. (2)
  5. Stochastic and weighted context-free grammars and statistical parsing. (4)
  6. Weighted dynamic programming. (2)
  7. Discriminative training: perceptron, boosting, maximum margin estimation. (2)
  8. Unsupervised learning: clustering and EM, clustering words. (1)
  9. The EM algorithm for structured models, and with hidden data and partially-hidden data; contrastive estimation. (2)
  10. Bayesian methods in NLP. (1)
  11. Semisupervised learning: Yarowsky algorithms, co-training. (1)
  12. Experimentation and hypothesis testing. (1)

The final lectures of the course will be devoted to the oral presentations and discussion.


Readings

Manning and Schütze’s Foundations of Statistical Natural Language Processing will be recommended for background reading during parts of the course, though many of the techniques taught are predated by that book. Readings will be suggested from recent conferences and journal articles, perhaps also chapters from Jurafsky & Martin’s Speech and Language Processing, MacKay’s Information Theory, Inference, and Learning Algorithms, Klavans and Resnik’s The Balancing Act, or other texts. No particular readings will be mandatory, though a great deal of reading will be required for completion of the literature review. See External resources for some useful links.