Evaluating for Diversity in Question Generation over TextDownload PDFOpen Website

2020 (modified: 04 Nov 2022)CoRR 2020Readers: Everyone
Abstract: Generating diverse and relevant questions over text is a task with widespread applications. We argue that commonly-used evaluation metrics such as BLEU and METEOR are not suitable for this task due to the inherent diversity of reference questions, and propose a scheme for extending conventional metrics to reflect diversity. We furthermore propose a variational encoder-decoder model for this task. We show through automatic and human evaluation that our variational model improves diversity without loss of quality, and demonstrate how our evaluation scheme reflects this improvement.
0 Replies

Loading