Unsupervised ASR via Cross-Lingual Pseudo-Labeling

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: ASR, pseudo-labeling, self-training, unsupervised learning, multilingual
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We show that we can use acoustic models from other languages to bootstrap an unsupervised acoustic model in a new language
Abstract: Recent work has shown that it is possible to train an *unsupervised* automatic speech recognition (ASR) system using only unpaired audio and text. Existing unsupervised ASR methods assume that no labeled data can be used for training. We argue that even if one does not have any labeled audio for a given language, there is *always* labeled data available for other languages. We show that it is possible to use character-level acoustic models (AMs) from other languages to bootstrap an *unsupervised* AM in a new language. Here, ``unsupervised'' means no labeled audio is available for the *target* language. Our approach is based on two key ingredients: (i) generating pseudo-labels (PLs) of the *target* language using some *other* language AM and (ii) constraining these PLs with a *target language model*. Our approach is effective on Common Voice: e.g. transfer of English AM to Swahili achieves 18\% WER. It also outperforms character-based wav2vec-U 2.0 by 15\% absolute WER on LJSpeech with 800h of labeled German data instead of 60k hours of unlabeled English data.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5862
Loading