SSDM: Scalable Speech Dysfluency Modeling

Jiachen Lian; Xuanru Zhou; Zoe Ezzes; Jet M.J. Vonk; Brittany T. Morin; David Paul Galang Baquirin; Zachary A. Miller; Maria Luisa Gorno-Tempini; Gopala Anumanchipalli

SSDM: Scalable Speech Dysfluency Modeling

Jiachen Lian, Xuanru Zhou, Zoe Ezzes, Jet M.J. Vonk, Brittany T. Morin, David Paul Galang Baquirin, Zachary A. Miller, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Speech Dysfluency, Disfluency, Stutter, Alignment, Articulatory, Scaling

TL;DR: A speech processing framework that supports language learning, speech therapy and disorder screening.

Abstract: Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions~~\cite{lian2023unconstrained-udm, lian-anumanchipalli-2024-towards-hudm} suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In this paper, we propose \textit{SSDM: Scalable Speech Dysfluency Modeling}, which (1) adopts articulatory gestures as scalable forced alignment; (2) introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; (3) introduces a large-scale simulated dysfluency corpus called Libri-Dys; and (4) develops an end-to-end system by leveraging the power of large language models (LLMs). We expect SSDM to serve as a standard in the area of dysfluency modeling. Demo is available at \url{https://berkeley-speech-group.github.io/SSDM/}.

Primary Area: Speech and audio

Submission Number: 18734

Loading