MoMamba: A Lightweight Music Oriented Mamba-Based Model for Music Information Retrieval Tasks

MoMamba: A Lightweight Music Oriented Mamba-Based Model for Music Information Retrieval Tasks

ICLR 2026 Conference Submission20385 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: MIR, Mamba, State Space Models, Music, Genre Classification, Global Key Classification, Emotion Regression, Instrument Classification, Pitch Classification

TL;DR: Mamba model adapted to music tasks

Abstract: Music Information Retrieval (MIR) tasks on raw audio have traditionally been tackled using convolutional neural networks (CNNs) and transformer-based models. While CNNs effectively capture local structures and transformers leverage attention for long-range dependencies, both architectures come with computational and scalability challenges. In this study, we introduce a novel extension of Mamba tailored to music. Our resulting method, MoMamba (Music Oriented Mamba), is a lightweight Mamba-based music classification model. We evaluate MoMamba's performance across several benchmark MIR tasks. Our results show that MoMamba consistently outperforms a number of baselines, including an existing Mamba-based method, on all of the benchmark datasets we considered. Importantly, all models were trained from scratch without any pretraining, making the performance gains especially notable since they cannot be attributed to transfer learning. Additionally, our model's performance rivals existing benchmarks from models pretrained on much larger datasets. Our work highlights the advantages of MoMamba in music analysis and retrieval such as accuracy and inference time, encouraging further research into its capabilities within the MIR domain.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 20385

Loading