Principled probing of foundation models in the auditory modality

Etienne Bost; Mitsuko Aramaki; Richard Kronland-Martinet; Sølvi Ystad; Thierry Artières; Thomas Schatz

Principled probing of foundation models in the auditory modality

Etienne Bost, Mitsuko Aramaki, Richard Kronland-Martinet, Sølvi Ystad, Thierry Artières, Thomas Schatz

Published: 10 Oct 2024, Last Modified: 01 Nov 2024NeurIPS 2024 Workshop on Behavioral MLEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Audio, Perception, Fine-grain, Sounds, Probing, Foundation models

TL;DR: We leverage ecological theories of sound perception and a perceptually calibrated sound dataset to probe audio foundation models.

Abstract: We leverage ecological theories of sound perception in humans and a carefully designed dataset of perceptually calibrated sounds to develop and carry out principled fine-grained probing of foundation models in relation to the auditory modality. We show that internal activations of the state-of-the-art audio foundation model BEATs correlate better with perceptual dimensions than a supervised audio classification model and a text-audio multimodal model and that all models fail to represent at least one perceptual dimension. We also report preliminary evidence suggesting that directions aligning invariantly with a perceptual dimension can be identified within the representation space at inner layers of the BEATs model. We briefly discuss future work and potential applications.

Submission Number: 85

Loading