Optimizing Large Language Models with Automatic Speech Recognition for Medication Corpus in Low-Resource Healthcare Settings.
Automatic Speech Recognition (ASR) systems, while effective in general contexts, often face challenges in low-resource settings, especially in specialized domains such as healthcare. This study investigates the integration of Large Language Models (LLMs) with ASR systems to improve transcription accuracy in such environments. Focusing on medication-related conversations in healthcare, we fine-tuned the Whisper-Large ASR model on a custom dataset, Pharma-Speak, and applied the LLaMA 3 model for second-pass rescoring to correct ASR output errors. To achieve efficient fine-tuning without altering the full LLM parameters, we employed Low-Rank Adaptation (LoRA), which enables re-ranking of the ASR’s N-best hypotheses while retaining the LLM's original knowledge. Our results demonstrate a significant reduction in Word Error Rate (WER) across multiple epochs, validating the effectiveness of the LLM-based rescoring method. The integration of LLMs in this framework shows potential for overcoming the limitations posed by conventional ASR models in low-resource settings. While computational constraints and the inherent strength of Whisper-Large presented some limitations, our approach lays the groundwork for further exploration of domain-specific ASR enhancements using LLMs, particularly in healthcare applications.