Two Experiments for Automatic Scoring of Handwritten Descriptive Answers

Masaki Nakagawa, Hung Tuan Nguyen, Thanh-Nghia Truong, Nam Tuan Ly, Cuong Tuan Nguyen, Haruki Oka, Tsunenori Ishioka, Tomo Asakura, Hiroshi Miyazawa, Takahiro Yamamoto, Toshihiko Horie, Fumiko Yasuno

Published: 01 Jan 2024, Last Modified: 05 Mar 2025DAS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents our motivation, design and two experiments for automatic scoring of handwritten descriptive answers. The first experiment is on scoring of handwritten short descriptive answers in Japanese language exams. We used a deep neural network (DNN)-based handwriting recognizer and a transformer-based automatic scorer without correcting misrecognized characters or adding rubric annotations for scoring. We achieved acceptable agreement between the automatic scoring and the human scoring, while using only 1.7% of the human-scored answers for training. The second experiment is to score descriptive answers written on electronic paper for Japanese, English, and math drills. We used DNN-based online and offline handwriting recognizers for each subject and took simple perfect matching of recognized candidates with correct answers. The experiment shows that the False Negative rate is reduced by combining the online and offline recognizers and the False Positive rate is reduced by rejecting low recognition scores. Even with the current system, human scorers only need to manually score less than 30% of the answers, with false positive (risky) scores of about 2% or less for the three subjects.