Leveraging Diffusion Models For Predominant Instrument Recognition

Published: 23 Sept 2025, Last Modified: 08 Nov 2025AI4MusicEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Predominant Instrument Recognition, Diffusion, Music information Retrieval
TL;DR: We investigate probing a pre-trained diffusion model to extract intermediate outputs for use in the downstream task of predominant instrument recognition.
Abstract: Predominant Instrument Recognition (PIR) remains a challenge in MIR primarily due to data limitations. Recent work suggests that generative diffusion models learn rich timbre representations from these limited sets, yet their utility for recognition tasks has not been explored. We present the first study probing intermediate diffusion features for PIR. Starting from a pretrained diffusion model, we fine-tune variants on IRMAS (the premier PIR dataset) and OpenPIR, a new metadataset of multi-predominant annotations for OpenMic that we introduce. We sample activations across noise levels and layers and evaluate them with lightweight classifier heads. Results show that low-noise bottleneck features are the most informative, and even simple Multi-Layer Perceptron (MLP) probes achieve promising results. Incorporating OpenPIR improves performance across models, with diffusion features rivaling baselines for certain instruments. These findings provide early evidence that audio diffusion models encode discriminative features, pointing toward the need for further research into unified diffusion-recognition frameworks.
Track: Paper Track
Confirmation: Paper Track: I confirm that I have followed the formatting guideline and anonymized my submission.
Submission Number: 109
Loading