Abstract: Automated grading of subjective questions remains a significant challenge in educational assessment. Traditional manual grading is inefficient and inconsistent, while existing AI-based methods lack flexibility and robustness in handling diverse answers. This paper introduces MASGrader, an innovative multi-agent framework for grading subjective answers, consisting of four agents: the Overview Agent, which performs macroscopic evaluation; the Detail Review Agent, which conducts microscopic reviews; the Logical Validation Agent, which checks semantic and logical consistency; and the Supervisory Agent, which coordinates debates and makes final decisions. MASGrader enhances grading accuracy, stability, and transparency by simulating human-like debate and reflection mechanisms. Experiments on a dataset of 500 subjective answers demonstrate that MASGrader improves weighted Kappa scores and accuracy by 5-10\% compared to a single-agent baseline while generating detailed scoring rationales that increase interpretability. By introducing dynamic collaboration, logical validation, and iterative self-improvement, this multi-agent framework provides a reliable solution for high-stakes educational assessment.
Paper Type: Short
Research Area: NLP Applications
Research Area Keywords: Automated Grading, Multi-Agent Systems, Subjective Question Answering, Interpretability
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models
Languages Studied: English, Chinese
Submission Number: 3426
Loading