A Unified Model for Oral Reading Fluency and Student Prosody

Yihao Wang, Zhongdi Wu, Joseph Nese, Akihito Kamata, Vedant Nilabh, Eric C. Larson

Published: 2025, Last Modified: 06 May 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In education, analyzing student oral reading is critical to reading comprehension, vocabulary development, and fluency. Automating these assessments is a crucial part of language research– however, researchers often use a separate model for each assessment. Transferring knowledge between assessments can make a model more reliable and robust. In this work, we introduce a unified model comprising a contrastive self-supervised embedding model and a transfer learning suffix network. The embedding model is trained on oral reading fluency data, generating student voice embeddings. Subsequently, the transfer learning network employs these embeddings as features to estimate oral reading prosody scores. Our model demonstrates high agreement with human raters, with a reliability of 0.55 (78% accuracy) on unseen passages compared to 0.63 between humans. This generalization suggests that the model can provide robust representations across various oral reading assessment tasks.

External IDs:dblp:conf/icassp/WangWNKNL25