Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models

Naoyuki Kanda; Yusuke Fujita; Kenji Nagamatsu

Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models

Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu

Published: 01 Jan 2018, Last Modified: 23 Aug 2024INTERSPEECH 2018EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Lattice-free maximum mutual information (LF-MMI) training, which enables MMI-based acoustic model training without any lattice generation procedure, has recently been proposed. Although LF-MMI showed high accuracy in many tasks, its MMI criterion does not necessarily maximize the speech recognition accuracy. In this work, we propose a lattice-free state-level minimum Bayes risk training (LF-sMBR), which maximizes state-level expected accuracy without relying on a lattice generation procedure. As is the case with the LF-MMI, LF-sMBR avoids redundant lattice generation by exploiting forward-backward calculation on phone N-gram space, which enables a much simpler and faster training based on an sMBR criterion. We found that special care for silence phones was essential for improving the accuracy by LF-sMBR. In our experiments on the AMI, CSJ and Librispeech corpora, LF-sMBR achieved small but consistent improvements over LF-MMI AMs, showing state-of-the-art results for each test set.

Loading