Variation Outweighs Syntax: An Empirical Analysis of Data Augmentation for Low-Resource ASR

Published: 28 Apr 2026, Last Modified: 28 Apr 2026MSLD 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: Automatic Speech Recognition, data augmentation, low-resource languages
TL;DR: Standard data augmentation methods have limited benefit for low-resource languages, but combining with simple generation-based augmentation has significant improvements.
Abstract: Standard data augmentation methods like SpecAugment have limited benefit in extremely low-resource settings. Furthermore, low-resource languages suffer from a lack of large text corpora or unlabeled/labeled audio to draw from. To address this limitation, we investigate two simple generation-based augmentation methods: Tag–Based and Random Replacement. Tag-Based Replacement uses only annotations commonly produced as part of the language documentation process, whereas Random Replacement does not use any linguistic information at all. Experiments on four extremely low-resource languages reveal a synergistic effect: while individual modification-based augmentations like SpecAugment yield marginal or inconsistent gains, combining them with generation-based augmentation reduces WER by an average of 4.7\% absolute (10.0\% relative), peaking at 8.0\% (13.5\% relative) reduction for Nashta. Crucially, maximizing variation proves more effective than preserving syntactic coherence.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 12
Loading