Few-Shot Synthetic-Only Accent Adaptation for ASR via LLM-Guided Phoneme Editing

Yurii Halychanskyi; Nimet Beyza Bozdag; Mark A. Hasegawa-Johnson; Dilek Hakkani-Tür; Volodymyr Kindratenko

Few-Shot Synthetic-Only Accent Adaptation for ASR via LLM-Guided Phoneme Editing

Yurii Halychanskyi, Nimet Beyza Bozdag, Mark A. Hasegawa-Johnson, Dilek Hakkani-Tür, Volodymyr Kindratenko

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0

Keywords: accented ASR, synthetic-only training, few-shot adaptation, LLM-based phoneme editing

TL;DR: We show that synthetic speech generated via few-shot accent-speaker adaptation and LLM-guided phoneme editing can improve accented ASR without using any real accented speech for fine-tuning.

Abstract: Accented automatic speech recognition (ASR) often degrades due to the limited availability of accented training data. While synthetic speech has been used for augmentation, prior work typically mixes synthetic and real speech, and purely synthetic fine-tuning has shown inconsistent gains. We investigate whether synthetic data alone, generated through accent-aware phoneme editing and few-shot speaker adaptation, can improve accented ASR without using real accented speech. We propose a pipeline that adapts a text-to-speech (TTS) decoder to a target-accent speaker using fewer than ten reference utterances and employs large language model (LLM)–based phoneme editing to generate accent-specific pronunciations. The resulting synthetic speech is used to fine-tune a self-supervised ASR model. Experiments demonstrate consistent word error rate (WER) reductions on real accented speech, including cross-speaker evaluation and ultra-low data regimes.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 33

Loading