Cohort-Sensitive Labeling: An Effective Approach for Enhancing ASR Performance

Jonghwan Na, Mark Hasegawa-Johnson, Bowon Lee

Published: 2025, Last Modified: 15 Apr 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper proposes a cohort-sensitive labeling (CSL) for automatic speech recognition (ASR). CSL is a method that distinguishes data labels based on cohorts, allowing models to learn cohort-specific information. For evaluation, we applied CSL using gender information in the training data of LibriSpeech dataset. Experimental results demonstrate that the CSL-based approach outperforms methods without CSL, given sufficient training data. Specifically, our method achieved average word error rate reduction (WERR) of 1.81% on the LibriSpeech test-clean and 5.76% on test-other datasets, when more than 100 hours of data were used for training. Moreover, on TIMIT and Common Voice test sets, it achieved WERR of up to 11.52% and 2.91%, respectively demonstrating its robustness and generalizability to unseen data. Additionally, the proposed method reached up to 97.21% accuracy in classifying the gender cohort, suggesting that ASR models trained with the CSL effectively leverage the cohort information.
Loading