Healing in One's Own Tongue: A Fine-tuned LLM-Powered Speech-to-Speech Translation System for Ethiopian Healthcare

ICLR 2026 Conference Submission21123 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: HMM-GMM, CNN-TDNNf, LLM
TL;DR: The first Amharic–Afan Oromo speech-to-speech translation system combines ASR, TTS, and translation to improve doctor–patient communication in Ethiopia’s healthcare sector.
Abstract: In Ethiopia’s multilingual healthcare landscape, language barriers between providers and patients impede accurate diagnosis and treatment. We introduce a novel speech-to-speech translation system for Amharic and Afan Oromo, two widely spoken yet low-resource Ethiopian languages, to enhance doctor-patient communication. Our system integrates automatic speech recognition (ASR), text-to-speech (TTS), and text-to-text translation, leveraging 250 hours of Amharic and 200 hours of Afan Oromo transcribed audio, alongside 6.5M Amharic and 6.5M Afan Oromo preprocessed text sentences. The ASR models, built using the Kaldi toolkit with HMM-GMM and CNN-TDNNf architectures, achieve word error rates of 12.36 (Amharic) and 19.6 (Afan Oromo). The TTS models, fine-tuned from SpeechT5 with speaker embeddings, yield validation losses of 0.3675 (Amharic) and 0.3662 (Afan Oromo), with Mean Opinion Scores indicating high naturalness and intelligibility. For text-to-text translation, we pre-trained mT5-large with the large-scale monolingual corpora of Amharic and Afan Oromo, followed by fine-tuning with 667,021 human-edited sentence pairs to build a single bi-directional translation model. mT5 natively supports Amharic but not Afan Oromo; continuation pretraining was essential for learning Afan Oromo representations effectively. All these models implemented with a Flutter front-end and Next.js back-end, our system outperforms existing solutions and enables seamless communication in healthcare settings. By reducing miscommunication, supporting precise diagnosis and documentation, and improving patient trust and satisfaction, this work advances global health goals and fosters cross-lingual research in lowresource settings.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 21123
Loading