Max-Margin Transducer Loss: Improving Sequence-Discriminative Training Using a Large-Margin Learning Strategy

Rupak Vignesh Swaminathan; Grant P. Strimel; Ariya Rastrow; Sri Harish Mallidi; Kai Zhen; Hieu Duy Nguyen; Nathan Susanj; Athanasios Mouchtaris

Max-Margin Transducer Loss: Improving Sequence-Discriminative Training Using a Large-Margin Learning Strategy

Rupak Vignesh Swaminathan, Grant P. Strimel, Ariya Rastrow, Sri Harish Mallidi, Kai Zhen, Hieu Duy Nguyen, Nathan Susanj, Athanasios Mouchtaris

Published: 01 Jan 2024, Last Modified: 21 May 2025ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this work, we propose a novel sequence-discriminative training criterion for automatic speech recognition (ASR) based on the Conformer Transducer. Inspired by the large-margin classifier framework, we separate the "good" and the "bad" hypotheses in an N-best list produced from a pre-trained transducer model by a margin (τ), hence the term, Max-Margin Transducer (MMT) loss. It is observed that fine-tuning with the proposed loss achieves significant improvement over baseline transducer loss but does not outperform the state-of-the-art minimum word error rate (MWER) training. However, combining the proposed MMT loss with MWER surpasses the performance of either losses suggesting the complimentary nature of MWER and MMT losses. With the combined losses, we obtained 7.44% and 7.68% relative WER improvements on Librispeech test-clean and test-other sets, respectively, and up to 8.9% relative improvement on Multi-lingual Librispeech test sets.

Loading