Improving Phone Recognition Through Informed Initialization and Path-Aligned CTC Loss

Zijian Fan, Xinwei Cao, Giampiero Salvi, Torbjørn Svendsen

Published: 2025, Last Modified: 20 Mar 2026MLSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We present a novel approach for fine-tuning ASR models for phone recognition. Firstly, we use frame-wise phone classification and cross entropy loss as means of initializing the model weights: informed initialization. Secondly, we introduce the path-aligned CTC (PA-CTC) loss that simplifies standard CTC by considering only the best alignment between input frames and output symbols. Experimental results show that informed initialization drastically improves phone classification and recognition performance for all fine-tuning loss functions. Furthermore, the PA-CTC loss results in models that generalize better on the out-of-domain tasks, phone recognition on child speech. Finally, we illustrate how our method results in less peaky models.
Loading