Scaling Evaluation of Non-objective Assessments Using AI-Based Solutions

Harsha Chaturvedi, Gowri Srinivasa, Akarsh Hegde, Ananya Adiga, Anwesha Paul, M. Poornachandra, Tejas R. Simha

Published: 01 Jan 2024, Last Modified: 12 Jun 2025COMPUTE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Evaluation of student answer scripts for multiple in-semester and end-semester assessments is beset with the challenges of scale, fairness and timeliness of providing meaningful feedback to students to facilitate their learning. The time-consuming and biased nature of manual evaluation has necessitated the development of automated grading systems. This work presents a solution that incorporates automated and near real-time digital evaluation of students’ descriptive answer scripts utilizing multi-modal Large Language Models (LLMs) across various university courses. The methodology involves a three-stage sequential process: setting up Retrieval Augmented Generation (RAG), context-aware rubric generation, and context-aware evaluation. We use RAG to provide context to the LLMs to automate the rubric composition for a question and perform an evaluation of answers submitted in hybrid mode (handwritten or digitally keyed in) in near-real time. The incorporation of inputs from one or more course instructors on the LLM-generated rubric vouchsafes meaningful feedback for students to facilitate learning. The results demonstrate the efficacy of our workflow on multiple undergraduate and graduate courses offered to students majoring in several Engineering courses, thus offering a promising solution to the challenge of providing scalable, automated assessments with feedback that facilitates learning.