Improved Language Models for ASR using Written Language Text

Kaustuv Mukherji, Meghna Pandharipande, Sunil Kumar Kopparapu

Published: 2022, Last Modified: 05 May 2025NCC 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The performance of an Automatic Speech Recognition (ASR) engine primarily depends on ($a$) the acoustic model (AM), (b) the language model (LM) and (c) the lexicon (Lx), While the contribution of each block to the overall performance of an ASR cannot be measured separately, a good LM helps in performance improvement in case of a domain specific ASR at a smaller cost. Generally, LM is greener compared to building AM and is much easier to build, for a domain specific ASR because it requires only domain specific text corpora. Traditionally, because of its ready availability, written language text (WLT) corpora has been used to build LM though there is an agreement that there a significant difference between WLT and spoken language text (SLT). In this paper, we explore methods and techniques that can be used to convert WLT into a form that realizes a better LM to support ASR performance.