Modeling Bilingual Disfluencies with Large Language Models

Published: 18 Jun 2024, Last Modified: 26 Jul 2024ICML 2024 Workshop on LLMs and Cognition PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Speech Disfluencies, Large Language Models
TL;DR: We propose and train models for predicting disfluencies in monolingual and bilingual speakers, using large language models.
Abstract: Speech disfluency metrics are commonly used for informing diagnosis and treatment of various communication disorders. However, bilingual speakers exhibit unique speech disfluency patterns, increasing the difficulty of speech and language disorder diagnosis in bilingual children and adults. We propose and train models for predicting disfluencies in monolingual and bilingual speakers, using LLMs and a modern machine learning pipeline. We use a novel bilingual dataset with detailed annotated disfluencies and participant information. We find that disfluencies tend to happen at high surprisal words, validating surprisal theory for both monolinguals and bilinguals. We also find some interesting differences in the manifestation of disfluencies between bilingual and monolingual speakers.
Submission Number: 25
Loading