Keywords: Music Information Retrieval, Polyphonic Instrument Classification, Data augmentation, Out-of-distribution generalization
TL;DR: This paper introduces a polyphonic data augmentation method based on the Persian dastgāh musical system, achieving strong out-of-distribution generalization on real-world polyphonic recordings.
Abstract: Musical instrument classification is essential for music information retrieval (MIR) and generative music systems. However, research on non-Western traditions, particularly Persian music, remains limited. We address this gap by introducing a new dataset of isolated recordings covering seven traditional Persian instruments, two common but originally non-Persian instruments (i.e., violin, piano), and vocals. We propose a culturally informed data augmentation strategy that generates realistic polyphonic mixtures from monophonic samples. Using the MERT model (Music undERstanding with large-scale self-supervised Training) with a classification head, we evaluate our approach with out-of-distribution data which was obtained by manually labeling segments of traditional songs. On real-world polyphonic Persian music, the proposed method yielded the best ROC-AUC (0.795), highlighting complementary benefits of tonal and temporal coherence. These results demonstrate the effectiveness of culturally grounded augmentation for robust Persian instrument recognition and provide a foundation for culturally inclusive MIR and diverse music generation systems.
Track: Paper Track
Confirmation: Paper Track: I confirm that I have followed the formatting guideline and anonymized my submission.
(Optional) Short Video Recording Link: https://drive.google.com/file/d/1QXNbhdMCHFL7dYL_IYB7exM8wvQAs1z_/view
Submission Number: 38
Loading