Keywords: Perceptual Features, Explainable Artificial Intelligence, Music Audio Tagging
TL;DR: This study introduces an interpretable music tagging model that combines symbolic, neural, and signal-based features, achieving strong performance while enhancing transparency.
Abstract: In the age of music streaming platforms, the task of automatically tagging music audio has garnered significant attention, driving re-
searchers to devise methods aimed at enhancing performance metrics on standard datasets. Most recent approaches rely on deep neu-
ral networks, which, despite their impressive performance, possess opacity, making it challenging to elucidate their output for a given input. While the issue of interpretability has been emphasized in other fields like medicine, it has not received attention in music-related
tasks. In this study, we explored the relevance of interpretability in the context of automatic music tagging. We constructed a workflow
that incorporates three different information extraction techniques: a) leveraging symbolic knowledge, b) utilizing auxiliary deep neural
networks, and c) employing signal processing to extract perceptual features from audio files. These features were subsequently used to
train an interpretable machine-learning model for tag prediction. We conducted experiments on two datasets, namely the MTG-Jamendo
dataset and the GTZAN dataset. Our method surpassed the performance of baseline models in both tasks and, in certain instances,
demonstrated competitiveness with the current state-of-the-art. We conclude that there are use cases where the deterioration in perfor-
mance is outweighed by the value of interpretability. This work was presented at ICASSP 2024 (https://ieeexplore.ieee.org/abstract/document/10669903) and can be classified under the category 'Other'.
Submission Number: 84
Loading