The Sound of Syntax: Finetuning and Comprehensive Evaluation of Language Models for Speech Pathology

The Sound of Syntax: Finetuning and Comprehensive Evaluation of Language Models for Speech Pathology

ACL ARR 2025 May Submission7198 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: According to the U.S. National Institutes of Health, approximately 5\%–9\% of children experience speech disorders that require clinical intervention. However, the number of certified speech-language pathologists (SLPs) is roughly twenty times fewer than the number of affected children, highlighting a significant gap in care and a pressing need to automate aspects of SLP workflows. Existing AI approaches for supporting SLPs typically address individual tasks in isolation, resulting in inconsistent performance and high deployment costs. Moreover, the scarcity of annotated datasets further limits progress in this domain. Recent advances in multimodal large language models (LLMs), particularly speech LLMs, offer promising opportunities for automating key SLP tasks and generating high-quality datasets. Despite this potential, there has been limited exploration of speech LLMs in this context. In this work, we introduce the first unified and comprehensive benchmarking framework for five core SLP tasks: (1) disorder screening, (2) speech transcription, (3) disorder-type classification, (4) symptom identification, and (5) transcript-based classification. Furthermore, we develop a fine-tuning strategy based on cross-task knowledge transfer, which enhances model performance across multiple tasks. Our experiments with 15 state-of-the-art LLMs show that while base models perform adequately on coarse-grained tasks, finetuning on the transcription task can yield substantial improvements across a broader set of tasks, demonstrating up to more than 30\% improvement over baseline approaches. We publicly release our datasets, models, and benchmark framework to support continued research in this area.

Paper Type: Long

Research Area: Special Theme (conference specific)

Research Area Keywords: Interdisciplinary Recontextualization of NLP, Speech Recognition, Text-to-Speech, Resources and Evaluation ,NLP Applications, Ethics, Bias, Fairness

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English, French

Submission Number: 7198

Loading