Principled probing of foundation models in the auditory modality

Published: 10 Oct 2024, Last Modified: 01 Nov 2024NeurIPS 2024 Workshop on Behavioral MLEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Audio, Perception, Fine-grain, Sounds, Probing, Foundation models
TL;DR: We leverage ecological theories of sound perception and a perceptually calibrated sound dataset to probe audio foundation models.
Abstract: We leverage ecological theories of sound perception in humans and a carefully designed dataset of perceptually calibrated sounds to develop and carry out principled fine-grained probing of foundation models in relation to the auditory modality. We show that internal activations of the state-of-the-art audio foundation model BEATs correlate better with perceptual dimensions than a supervised audio classification model and a text-audio multimodal model and that all models fail to represent at least one perceptual dimension. We also report preliminary evidence suggesting that directions aligning invariantly with a perceptual dimension can be identified within the representation space at inner layers of the BEATs model. We briefly discuss future work and potential applications.
Submission Number: 85
Loading