Improving Oral Reading Fluency Assessment Through Sub-Sequence Matching of Acoustic Word Embeddings

Yihao Wang, Zhongdi Wu, Joseph Nese, Akihito Kamata, Vedant Nilabh, Eric C. Larson

Published: 2024, Last Modified: 06 May 2026ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Oral reading fluency assessment is a process where a student reads a passage aloud and is scored against words read correctly by a human listener. Current automatic reading fluency systems match these words read using speech recognition models trained with clean speech data from native adult speakers. This mismatch in training and deployment, compounded by numerous background noises from the classroom, means that student speech is often not correctly recognized. This paper describes a deep learning model that employs text-to-speech and contrastive learning to create acoustic word embeddings of student speech. This embedding is trained with unlabeled data of students reading known passages. Our model then uses sub-sequence matching in the acoustic embedding space to estimate words read correctly per minute, a common criterion in oral reading fluency. Our model’s words read correctly per minute is significantly closer to human listeners compared to systems that use automatic speech recognition only, reducing error of words correct per minute from 15.1 to 8.4, on average.

External IDs:dblp:conf/icassp/WangWNKNL24