Zero-Shot Audio Classification Using Synthesised Classifiers and Pre-Trained Models

Zheng Gu, Xinzhou Xu, Shuo Liu, Björn W. Schuller

Published: 2022, Last Modified: 12 Jun 2025CISP-BMEI 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Audio classification equips a machine with the feature of recognising the source of an audio sample. Different from the conventional setting, by using zero-shot learning, an audio classifier can work for new audio sources that are not appearing during training. However, current zero-shot audio classification methods have no sufficient capability in retrieving the discriminative information from seen-class audio samples, and hence, lead to very limited performance for transferring knowledge in representing audio features. To this end, we propose an approach using multiple synthesised classifiers and pre-trained models in order to jointly optimise several phantom discriminative classifiers on the audio features generated through pre-trained ResNet models. Our experimental results, based on the ESC-50 dataset, validate the effectiveness of the proposed approach, compared with the state-of-the-art approaches.