Two Front-Ends, One Model : Fusing Heterogeneous Speech Features for Low Resource ASR with Multilingual Pre-TrainingDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Transfer learning is widely applied in various deep learning-based speech tasks, especially for tasks with a limited amount of data. Recent studies in transfer learning mainly focused on either supervised or self-supervised perspectives. This work, however, seeks to incorporate the two schemes together towards low-resource automatic speech recognition (ASR) for minor and endangered language (EL) communities. We propose a general framework to use learned transformations to resolve time resolution differences between any speech features, allowing for fusion of any self-supervised representations or spectral features used in multilingual pre-training. Our experiments over two low-resource languages and three ELs demonstrate that the proposed framework can significantly improve the absolute average word error rate from 45.4% to 35.5%.
0 Replies

Loading