Enhancing Speech Recognition with LLMs in Post-Correction Settings

04 Aug 2024 (modified: 26 Sept 2024)Submitted to ICOMPEveryoneRevisionsBibTeXCC BY 4.0
Keywords: ASR, LLM, Correction
Abstract: The rapid development of Automatic Speech Recognition (ASR) systems in audio transcription tasks to get text content. However, even State-of-the-Art systems do not always provide excellent results and can make mistakes, especially in the new speech domain. To address this problem, developers either fine-tune this system on specific data to adapt the ASR model to their domain or incorporate Language Models, which gained success in Natural Language understanding to the overall prediction re-scoring. In this work, we decided to improve the quality of transcriptions in a post-correction setting, fine-tuning the external Large Language Model (LLM) without tuning the ASR system. We demonstrated that this approach is prominent, and one fine-tuned LLM improves the results of different ASR models. We significantly enhanced the quality metrics compared to the baselines and competitors.
Submission Number: 73
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview