Listenable Maps for Zero-Shot Audio Classifiers

Francesco Paissan; Luca Della Libera; Mirco Ravanelli; Cem Subakan

Listenable Maps for Zero-Shot Audio Classifiers

Francesco Paissan, Luca Della Libera, Mirco Ravanelli, Cem Subakan

Published: 25 Sept 2024, Last Modified: 13 Jan 2025NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Zero shot audio classifiers, Posthoc explanations

TL;DR: We propose a posthoc explanation method for zero shot audio classifiers.

Abstract: Interpreting the decisions of deep learning models, including audio classifiers, is crucial for ensuring the transparency and trustworthiness of this technology. In this paper, we introduce LMAC-ZS (Listenable Maps for Zero-Shot Audio Classifiers), which, to the best of our knowledge, is the first decoder-based post-hoc explanation method for explaining the decisions of zero-shot audio classifiers. The proposed method utilizes a novel loss function that aims to closely reproduce the original similarity patterns between text-and-audio pairs in the generated explanations. We provide an extensive evaluation using the Contrastive Language-Audio Pretraining (CLAP) model to showcase that our interpreter remains faithful to the decisions in a zero-shot classification context. Moreover, we qualitatively show that our method produces meaningful explanations that correlate well with different text prompts.

Primary Area: Interpretability and explainability

Submission Number: 10433

Loading