MAGIC: Multi-Agent Argumentation and Grammar Integrated Critiquer

MAGIC: Multi-Agent Argumentation and Grammar Integrated Critiquer

ACL ARR 2025 May Submission6211 Authors

20 May 2025 (modified: 29 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Automated Essay Scoring (AES) and Automatic Essay Feedback (AEF) systems aim to reduce the workload of human raters in educational assessment. However, most existing systems prioritize numeric scoring accuracy over the quality of feedback. This paper presents Multi-Agent Argumentation and Grammar Integrated Critiquer (MAGIC), a framework that uses multiple specialized agents to evaluate distinct writing aspects to both predict holistic scores and produce detailed, rubric-aligned feedback. To support evaluation, we curated a novel dataset of past GRE practice test essays with expert-evaluated scores and feedback. MAGIC outperforms baseline models in both essay scoring , as measured by Quadratic Weighted Kappa (QWK). We find that despite the improvement in QWK, there are opportunities for future work in aligning LLM-generated feedback to human preferences.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: educational applications, essay scoring, multimodal applications, NLP for social good

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 6211

Loading