Abstract: This paper introduces content-based similarity for automatic scoring of handwritten descriptive answers, focusing on Japanese, English, and mathematical expressions. Our experiments were made on a collection of handwritten descriptive answers from elementary school students, encompassing 37,500 Japanese, 15,896 English, and 86,264 math answers. We used neural network-based online and offline handwriting recognizers for each answer and applied automatic scoring of recognized candidates with expected answers. In the initial experiment, we applied a perfect match with expected answers, revealing issues and challenges, especially with the rate of correct answers scored as wrong (false negatives) exceeding 30% in some subjects. Then, we propose a recognition confidence-based rejection scheme to reduce false positives. Moreover, we propose content-awareness similarity that calculates a similarity between the recognized candidates of an answer and the expected answers. According to the computed similarity, it scores the answers as correct, wrong, or rejected. Human scorers should score false negative answers that are likely claimed by students and rejected answers. The experiment suggests that human scorers need to score 14.39% after applying the automatic scoring method with the rate of incorrect answers scored correct of 3.03% for Japanese, the former as 10.75% and the latter as 1.79% for English, and the former as 27.34% and the latter as 0.45% for math. These promising results underscore the system's effectiveness.
Loading