Keywords: MIR, Mamba, State Space Models, Music, Genre Classification, Global Key Classification, Emotion Regression, Instrument Classification, Pitch Classification
TL;DR: Mamba model adapted to music tasks
Abstract: Music Information Retrieval (MIR) tasks on raw audio have traditionally been tackled using convolutional neural networks (CNNs) and transformer-based models. While CNNs effectively capture local structures and transformers leverage attention for long-range dependencies, both architectures come with computational and scalability challenges. In this study, we introduce a novel extension of Mamba tailored to music. Our resulting method, MoMamba (Music Oriented Mamba), is a lightweight Mamba-based music classification model. We evaluate MoMamba's performance across several benchmark MIR tasks. Our results show that MoMamba consistently outperforms a number of baselines, including an existing Mamba-based method, on all of the benchmark datasets we considered. Importantly, all models were trained from scratch without any pretraining, making the performance gains especially notable since they cannot be attributed to transfer learning. Additionally, our model's performance rivals existing benchmarks from models pretrained on much larger datasets. Our work highlights the advantages of MoMamba in music analysis and retrieval such as accuracy and inference time, encouraging further research into its capabilities within the MIR domain.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 20385
Loading