Current Evaluation Methods are a Bottleneck in Automatic Question Generation

Published: 14 Dec 2023, Last Modified: 04 Jun 2024AI4ED-AAAI-2024 day1posterEveryoneRevisionsBibTeXCC BY 4.0
Track: Innovations in AI for Education (Day 1)
Paper Length: long-paper (6 pages + references)
Keywords: automatic question generation, evaluation methods, machine translation, crowdsourcing, human evaluators, ablation studies
TL;DR: This paper discusses the current evaluation methods, their advantages, and limitations for assessing the quality of automatically generated questions.
Abstract: This study provides a comprehensive review of frequently used evaluation methods for assessing the quality of automatic question generation (AQG) systems based on computational linguistics techniques and large language models. As we present a comprehensive overview of the current state of evaluation methods, we discuss the advantages and limitations of each method. Furthermore, we elucidate the next steps for the full integration of automatic question generation systems in educational settings to achieve effective personalization and adaptation.
Cover Letter: pdf
Submission Number: 63
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview