\textit{D-GEN}: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Models

\textit{D-GEN}: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Models

ACL ARR 2025 February Submission1541 Authors

13 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract:

Evaluating generative models with open-ended generation is challenging due to inconsistencies in response formats. Multiple-choice (MC) evaluation mitigates this issue, but generating high-quality distractors is time-consuming and labor-intensive. We introduce \textit{D-GEN}, the first open-source distractor generator model that transforms open-ended data into an MC format. To evaluate distractor quality, we propose two novel methods: 1) ranking alignment, ensuring generated distractors retain the discriminatory power of ground-truth distractors, and 2) entropy analysis, comparing model confidence distributions. Our results show that \textit{D-GEN} preserves ranking consistency (Spearman’s $\rho$ 0.99, Kendall’s $\tau$ 0.94) and closely matches the entropy distribution of ground-truth distractors. Human evaluation further confirms the fluency, coherence, distractiveness, and incorrectness. Our work advances robust and efficient distractor generation with automated evaluation, setting a new standard for MC evaluation.

Paper Type: Long

Research Area: Generation

Research Area Keywords: efficient models, automatic evaluation, analysis, human evaluation

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Submission Number: 1541

Loading