Universal Speech Disorder Recognition: Towards a Foundation Model for Cross-Pathology Generalisation
Keywords: ASR, Speech Disorder, Stroke, Whisper, Foundation Models
TL;DR: Fine-tuning Whisper on patients with speech impairments results in improved ASR performance across speech disorders outside their training domain. The study shows the potential for universal speech disorder recognition in clinical applications.
Abstract: Although Automatic Speech Recognition (ASR) systems hold great potential for diagnosing and monitoring speech disorders, their clinical utility has been limited due to the scarcity of pathological speech data capable of capturing the heterogeneity of speech disorders and the high-dimensional output space. This work presents a significant advancement by fine-tuning the Whisper foundation model on our novel in-house SONIVA dataset, which consists of $\approx$600 stroke patients with diverse speech impairments. We address the critical challenge of limited training data in healthcare ASR demonstrating that our stroke-specific model generalises effectively across diverse speech pathologies. Our model outperforms both the off-the-shelf Whisper and traditional disorder-specific ASR models, demonstrating improved recognition accuracy on SONIVA unseen patients, as well as AphasiaBank and DementiaBank. This extends beyond stroke-related language impairments, improving performance also in other neurological disorders. This cross-pathology approach was achieved through training on a single disorder with a heterogeneous impairment profile, representing a major step towards more adaptable disordered speech recognition in healthcare.
Our study highlights how adapting foundation models for clinical tasks can advance universal speech disorder recognition despite data scarcity, with broad implications for diagnosis, monitoring, and patient care, potentially enabling more accessible and personalised speech therapy.
Submission Number: 97
Loading