Beyond the Gold Standard in Analytic Automated Essay Scoring

Published: 22 Jun 2025, Last Modified: 22 Jun 2025ACL-SRW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Natural Language Processing, Automated Essay Scoring, Analytic Scoring, Feedback Generation, Perspectivism
TL;DR: Automated Essay Scoring is shifting from holistic to analytic scoring for better feedback, but rater variability is a challenge. We propose an approach that learns from individual raters instead of gold-standard labels to improve trustworthiness.
Abstract: Originally developed to reduce the manual burden of grading standardised language tests, Automated Essay Scoring (AES) research has long focused on holistic scoring methods which offer minimal formative feedback in the classroom. With the increasing demand for technological tools that support language acquisition, the field is turning to analytic AES (evaluating essays according to different linguistic traits). This approach holds promise for generating more detailed essay feedback, but relies on analytic scoring data that is both more cognitively demanding for humans to produce, and prone to bias. The dominant paradigm in AES is to aggregate disagreements between raters into a single gold-standard label, which fails to account for genuine examiner variability. In an attempt to make AES more representative and trustworthy, we propose to explore the sources of disagreements and lay out a novel AES system design that learns from individual raters instead of the gold standard labels.
Archival Status: Archival
Paper Length: Long Paper (up to 8 pages of content)
Submission Number: 8
Loading