Disentangling Textual and Acoustic Features of Neural Speech Representations

Hosein Mohebbi; Grzegorz Chrupała; Willem Zuidema; Afra Alishahi; Ivan Titov

Disentangling Textual and Acoustic Features of Neural Speech Representations

Hosein Mohebbi, Grzegorz Chrupała, Willem Zuidema, Afra Alishahi, Ivan Titov

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Disentangling Representations, Spoken language Processing, Speech Emotion Recognition, Interpretability

Abstract: Neural speech models build entangled internal representations, which capture a variety of features (e.g., pitch, loudness, syntax, or semantics of an utterance) in a distributed encoding. This complexity makes it difficult to track how such representations rely on textual and acoustic information when used in downstream applications, limiting their interpretability. In this paper, we build upon the Information Bottleneck principle to propose a disentanglement framework that separates speech representations learned by pre-trained neural speech models into two distinct components: one encoding content (i.e., what can be transcribed as text) and the other encoding acoustic features relevant to a downstream task. We apply and evaluate our framework to emotion recognition and speaker identification target tasks, quantifying the contribution of textual and acoustic features at each model layer. We also apply our disentanglement framework as an attribution method to identify the most salient speech frame representations from both the textual and acoustic perspectives.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11174

Loading