Agentic AI Framework for Low-Resource Essay Evaluation via Scoring, Explanation, and Debate

Surendrabikram Thapa, Kritesh Rauniyar, Shuvam Shiwakoti, Surabhi Adhikari, Junaid Rashid, Jungeun Kim, Usman Naseem

Published: 2025, Last Modified: 28 May 2026IEEE Big Data 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Automated essay scoring (AES) systems are being widely used in educational assessments. The field advanced considerably in recent years with foundational models like large language models (LLMs). Despite these advancements, most research efforts remain confined to high-resource languages and struggle to capture the linguistic and stylistic nuances of lowresource contexts. To address this issue, this paper presents a multi-agent LLM framework for essay evaluation in Nepali, a language with limited computational resources and no prior evaluation benchmarks. Built using the CrewAI orchestration system, the framework coordinates four specialized agents, viz. Scorer, Explainer, Debator, and Final Reviser, that iteratively assign, justify, critique, and consolidate essay scores through deliberative reasoning. To support this effort, we first released NESA, the first expert-annotated dataset of 485 Nepali student essays, graded on a 1-10 scale, and established a standardized benchmark for Nepali AES. We benchmarked on zero-shot baselines and agentic workflow. Experiments across multiple opensource LLMs demonstrate that the agentic workflow enhances both scoring consistency and interpretability, achieving up to a 7% improvement in Quadratic Weighted Kappa (QWK) compared to zero-shot-prompt baselines. The proposed framework also improves transparency by enabling interpretable intermediate reasoning between agents, bridging the gap between model output and human grading behavior. These results illustrate the potential of multi-agent collaboration to address fairness and reliability challenges in low-resource educational assessment. By integrating structured reasoning and collaborative evaluation, this work advances equitable, explainable, and reproducible educational AI for underrepresented languages, establishing a foundation for future multilingual AES research. The dataset and accompanying resources are publicly available at https://github.com/therealthapa/NESA.

External IDs:dblp:conf/bigdataconf/ThapaRSARKN25