A Discrete and Variational Approach to Speech Representation Learning

18 Sept 2023 (modified: 02 May 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: speech representation learning, self-supervised learning, predictive coding, variational learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: A variational perspective of self-supervised speech representation learning through predictive coding.
Abstract: Previous work on self-supervised speech representation learning has taken diverse forms. However, it is plausible if there exists a learning objective that connects, or even generalizes across, distinct approaches. In this paper we propose a variational perspective that extends recent approaches, such as HuBERT, VQ-APC, and draws connections to VQ-CPC and wav2vec 2.0. We show that previous work can be formulated as a discrete latent variable model via predictive coding, and the proposed loss function provides an optimization advantage over other approaches. The learned representations through proposed approach obtain sizable improvements on phonetic classification, speaker verification and automatic speech recognition. Moreover, the variational principle not only provides a unification of approaches, but also a information theoretic lens for analyizing the learning of representations. We utilize the KL term and reconstruction term of the variational objective, also known as rate and distortion, to inspect the training dynamics. The outcome reveals that rather than the distortion, a model achieves superior downstream performance when the KL divergence between distinct signal components is minimized.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1449
Loading