Are Rubrics All You Need? Towards Flexible Rubric-based Automatic Short-Answer Scoring via Attention-based Span Alignment and Pairwise Ranking

Are Rubrics All You Need? Towards Flexible Rubric-based Automatic Short-Answer Scoring via Attention-based Span Alignment and Pairwise Ranking

ACL ARR 2025 May Submission2901 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In educational assessment, scoring rubrics are essential to the practitioner's toolbox since they define the exact criteria for scoring learner responses. However, in past NLP research on automatic short-answer scoring, scoring rubrics are rarely used as explicit scoring references and, if used, mostly treated as supplementary input. With this study, we aim to explore different possible implementations for rubric-based short answer scoring where models are explicitly conditioned towards using a provided rubric as a scoring reference. For this purpose, we propose GRASP, a novel pointer-based architecture that uses bilinear attention to predict the alignment between pooled span embeddings of student answers and rubric criteria from a single encoder forward pass. Moreover, we explore SBERT and Cross Encoders for pairwise ranking, and include five-shot prompting generative LLMs as baseline. We compare all methods using a novel German short answer scoring dataset and the established English ASAP-SAS. Results reveal that the effectiveness of the different methods depends on the nature of the dataset. For ASAP-SAS, pairwise ranking achieves a competitive performance close to the state of the art, while GRASP underperforms. However, for the German dataset, this is reversed. There, GRASP significantly outperforms the other methods and generalises better to unseen questions.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: educational applications, NLP datasets, short-answer scoring, generalization, fine-tuning, corpus creation, evaluation

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: German, English

Submission Number: 2901

Loading