When AI explains in natural language: Unveiling the impact of generative AI explanations on educators’ grading and feedback practices

Yuheng Li, Zirui Shan, Mladen Raković, Quanlong Guan, Dragan Gašević, Guanliang Chen

Published: 01 Nov 2025, Last Modified: 15 Jan 2026Education and Information TechnologiesEveryoneRevisionsCC BY-SA 4.0

Abstract: Grading and feedback provision for students’ open-ended responses are time-consuming and cognitively demanding. Despite the advocacy to leverage Artificial Intelligence (AI) to automate assessment, concerns persist regarding inaccurate AI assessment on student learning and the potential detrimental effects on educators’ assessment practices due to reliance. One alternative is to leverage AI-powered assessment insights as auxiliary information to support educators’ assessment instead of solely entrusting AI for assessment. However, insights published in existing literature typically required excessive cognitive efforts from human graders to interpret and make use of. Generative AI (GenAI) technologies’ capability To produce natural-language assessment explanations could potentially tackle this challenge. To empirically examine the efficacy of such natural-language insights while accounting for concerns in over-reliance on AI, we invited 60 human graders from diverse backgrounds to participate in multiple phases of assessment for secondary students’ short-answer responses. Participants were assigned to one of the three conditions: no AI-powered assessment support; important-word highlights from AI-powered graders as support; and natural-language grading explanations from GenAI-powered graders as support. Mixed-effect regression analyses were adopted to examine the impacts of different AI-powered assessment insights on human assessment practices both in current assessment with AI-powered insights presented and in later assessment without AI-powered insights. Our findings indicate that GenAI-enabled natural-language insights significantly improved educators’ feedback quality compared to educators without AI support (\(\beta =0.190\), \(p=0.010\)). Important-word highlights from traditional AI graders had negligible impact on educators’ feedback quality. Although significant improvements in grading accuracy were not observed, natural-language insights showed greater potential to enhance grading accuracy (\(\beta =0.556\), \(p=0.093\)) than important-word highlights (\(\beta =-0.060\), \(p=0.848\)). Furthermore, educators reported significantly greater satisfaction and willingness to adopt the natural-language insights in practice compared to the important-word insights (\(p<0.05\), \(r \in [0.352, 0.490]\)). Finally, prior exposure to AI-powered insights to some extent fostered educators’ more effective assessment practices in subsequent assessment activities without AI support, though further longitudinal research is needed to establish empirical significance.

External IDs:doi:10.1007/s10639-025-13741-z