Mixture of Experts for Code-Switching and Mitigating Catastrophic Forgetting in Automatic Speech Recognition: A Case Study in Thai Colonoscopic Reporting

Arpanant Saeng-Xuto, Pasit Jakkrawankul, Krittamate Tiankanon, Kasenee Tiankanon, Natawut Nupairoj, Peerapon Vateekul

Published: 01 Jan 2025, Last Modified: 22 May 2025IEEE Access 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: During colonoscopy procedures, gastroenterologists operate equipment with both hands, making real-time documentation of abnormal findings impractical. This reliance on memory increases the risk of missed details and prolongs report generation. Speech recognition technology offers a potential solution, but existing models struggle with Thai-English code-switching and suffer from overfitting when fine-tuned on small datasets. To address these challenges, we propose an Automatic Speech Recognition (ASR) model enhanced with the Mixture of Experts (MoE) technique within Transformer decoder layers. This approach enables domain specialization in gastroenterology (GI) while preserving Thai language knowledge, improving code-switching performance and mitigating catastrophic forgetting. Additionally, our Named Entity Recognition (NER) model extracts and categorizes GI terms from transcriptions to streamline colonoscopic reporting. Experimental results show that our MoE-ASR model achieves low word error rates (WER) (GI: 1.31%, Thai: 2.06%), with high medical term recall (MTR: 96.53%) and non-medical term recall (NTR: 95.85%), outperforming baseline models. The NER model also demonstrates strong performance, achieving precision (96.16%), recall (96.06%), and F1-score (95.11%). These results highlight the effectiveness of our approach in improving real-time documentation, reducing reliance on memory, and enhancing the efficiency of colonoscopic reporting.