Healing in One's Own Tongue: A Fine-tuned LLM-Powered Speech-to-Speech Translation System for Ethiopian Healthcare
Keywords: HMM-GMM, CNN-TDNNf, LLM
TL;DR: The first Amharic–Afan Oromo speech-to-speech translation system combines ASR, TTS, and translation to improve doctor–patient communication in Ethiopia’s healthcare sector.
Abstract: In Ethiopia’s multilingual healthcare landscape, language barriers between providers and patients impede accurate diagnosis and treatment. We introduce a novel speech-to-speech translation system for Amharic and Afan Oromo, two widely spoken yet low-resource Ethiopian languages, to enhance doctor-patient communication. Our system integrates automatic speech recognition (ASR), text-to-speech (TTS), and text-to-text translation, leveraging 250 hours of Amharic and 200 hours of Afan Oromo transcribed audio, alongside 6.5M Amharic and 6.5M Afan Oromo preprocessed text sentences. The ASR models, built
using the Kaldi toolkit with HMM-GMM and CNN-TDNNf architectures, achieve word error rates of
12.36 (Amharic) and 19.6 (Afan Oromo). The TTS models, fine-tuned from SpeechT5 with speaker
embeddings, yield validation losses of 0.3675 (Amharic) and 0.3662 (Afan Oromo), with Mean Opinion Scores indicating high naturalness and intelligibility. For text-to-text translation, we pre-trained mT5-large with the large-scale monolingual corpora of Amharic and Afan Oromo, followed by fine-tuning
with 667,021 human-edited sentence pairs to build a single bi-directional translation model. mT5 natively
supports Amharic but not Afan Oromo; continuation pretraining was essential for learning Afan Oromo
representations effectively. All these models implemented with a Flutter front-end and Next.js back-end, our system outperforms existing solutions and enables seamless communication in healthcare settings. By reducing miscommunication, supporting precise diagnosis and documentation, and improving patient trust and satisfaction, this work advances global health goals and fosters cross-lingual research in lowresource settings.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 21123
Loading