CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring

CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring

ACL ARR 2026 January Submission877 Authors

25 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Automated Essay Scoring, Multi-Agent System, Multimodal Large Language Models

Abstract: Automated Essay Scoring (AES) is crucial for modern education, particularly with the increasing prevalence of multimodal assessments. However, traditional AES methods struggle with evaluation generalizability and multimodal perception, while even recent Multimodal Large Language Model (MLLM)-based approaches can produce hallucinated justifications and scores misaligned with human judgment. To address the limitations, we introduce **CAFES**, the first collaborative multi-agent framework specifically designed for AES. It orchestrates three specialized agents: an Initial Scorer for rapid, trait-specific evaluations; a Feedback Pool Manager to aggregate detailed and evidence-grounded feedback; and a Reflective Scorer that iteratively refines scores based on this feedback to enhance human alignment. Extensive experiments, using widely adopted MLLMs, achieve an average relative improvement of 21% in Quadratic Weighted Kappa (QWK) against ground truth, with particularly strong gains in grammatical and lexical diversity. Our proposed CAFES paves the way for an intelligent multimodal AES system. The code and dataset are available at https://anonymous.4open.science/r/CAFES-C87F/.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Automated Essay Scoring, Multi-Agent System, Multimodal Large Language Models

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 877

Loading