LLM as decoder: Investigating Lattice-based Speech Recognition Hypotheses Rescoring Using LLM

Sheng Li; Yuka Ko; Akinori Ito

LLM as decoder: Investigating Lattice-based Speech Recognition Hypotheses Rescoring Using LLM

Sheng Li, Yuka Ko, Akinori Ito

Published: 01 Jan 2024, Last Modified: 20 Mar 2025APSIPA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: With the strong representational power of large language models (LLMs), generative error correction (GER) for automatic speech recognition (ASR) aims to provide semantic and phonetic refinements to address ASR errors. However, the previous LLM GER method used n-best lists, which are simpler, more efficient, and more vulnerable to the issue of hypothesis space collapse. Hypothesis space collapse is a phenomenon in speech recognition where the range of possible interpretations becomes too narrow. This paper proposes replacing n-best hypotheses with lattice, a more flexible ASR output format, to improve the LLM GER and reduce the likelihood of hypothesis space collapse. Experiments on CSJ corpus show that compared with using n-best hypotheses, using lattice can improve the performance of Japanese speech recognition.

Loading