CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring

CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring

ACL ARR 2025 May Submission294 Authors

10 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Automated Essay Scoring (AES) is crucial for modern education, particularly with the increasing prevalence of multimodal assessments. However, traditional AES methods struggle with evaluation generalizability and multimodal perception, while even recent Multimodal Large Language Model (MLLM)-based approaches can produce hallucinated justifications and scores misaligned with human judgment. To address the limitations, we introduce CAFES, the **first collaborative multi-agent framework specifically designed for AES**. It orchestrates three specialized agents: an initial scorer for rapid, trait-specific evaluations; a feedback pool manager to aggregate detailed, evidence-based strengths; and a reflective scorer that iteratively refines scores based on this feedback to enhance human alignment. Extensive experiments, using state-of-the-art MLLMs, achieve an average relative improvement of 21% in Quadratic Weighted Kappa (QWK) against ground truth, especially for grammatical and lexical diversity. CAFES paves the way for an intelligent multimodal AES system.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Automated Essay Scoring, Multi-Agent System, Multimodal Large Language Models

Contribution Types: Model analysis & interpretability

Languages Studied: English

Keywords: Automated Essay Scoring, Multi-Agent System, Multimodal Large Language Models

Submission Number: 294

Loading