Evaluating LLM-Generated Q&A Test: A Student-Centered Study

Anna Wróblewska, Bartosz Grabek, Jakub Swistak, Daniel Dan

Published: 2025, Last Modified: 16 Mar 2026AIED (2) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This research prepares an automatic pipeline for generating reliable question-answer (Q&A) tests using AI chatbots. We automatically generated a GPT-4o–based Q&A test for a Natural Language Processing course and evaluated its psychometric and perceived-quality metrics with students and experts. A mixed-format IRT analysis showed that the generated items exhibit strong discrimination and appropriate difficulty, while student and expert star-ratings reflect high overall quality. A uniform DIF check identified two items for review. These findings demonstrate that LLM-generated assessments can match human-authored tests in psychometric performance and user satisfaction, illustrating a scalable approach to AI-assisted assessment development.

External IDs:dblp:conf/aied/WroblewskaGSD25