This note refers to the following technical report.

M. Heilman and N. A. Smith. 2009. Question Generation via Overgenerating Transformations and Ranking. Language Technologies Institute, Carnegie Mellon University Technical Report CMU-LTI-09-013. [PDF]

After the original release of the technical report, a mistake in the experiments was discovered.

In the experiments, there were two types of rankers. One, called the "aggregate" ranker, used regressions trained to identify specific types of deficiencies in questions (e.g., grammaticality, vagueness). The other, called the "boolean" ranker, used a single regression on whether a question was acceptable or not. The report included results that the aggregate ranking approach performed better than the boolean ranking approach in general. However, we noticed a bug in an evaluation script that was causing the boolean ranker to perform poorly. Thus, the poor results in the report for the boolean approach should be ignored.

During subsequent development of the system, we found that a correctly working version of the boolean approach performed as well if not slightly better than the aggregate version. Therefore, the boolean approach seems preferable because of its good performance and simplicity. Note also that the boolean approach does not require annotation of multiple deficiencies but can instead be trained on ratings of holistic question acceptability, which are probably easier to gather reliably.

As a final note, the most complete description of the question generation system is the first author's 2011 dissertation, titled "Automatic Factual Question Generation from Text."

Click here to go to Michael Heilman's home page.

Click here to go to the page for the Question Generation system.