Keywords: ASR, LLM, Correction
Abstract: The rapid development of Automatic Speech Recognition (ASR) systems in audio transcription tasks to get text content. However, even State-of-the-Art systems do not always provide excellent results and can make mistakes, especially in the new speech domain. To address this problem, developers either fine-tune this system on specific data to adapt the ASR model to their domain or incorporate Language Models, which gained success in Natural Language understanding to the overall prediction re-scoring. In this work, we decided to improve the quality of transcriptions in a post-correction setting, fine-tuning the external Large Language Model (LLM) without tuning the ASR system. We demonstrated that this approach is prominent, and one fine-tuned LLM improves the results of different ASR models. We significantly enhanced the quality metrics compared to the baselines and competitors.
Submission Number: 73
Loading