MambaKit: Towards Modular Intelligence - Cognitive Scaffolding in Next-Generation Drum Synthesis

Published: 27 Sept 2025, Last Modified: 09 Nov 2025NeurIPS Creative AI Track 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: Paper
Keywords: Machine Learning, Modular intelligence, Music
TL;DR: AI + deterministic musical priors = production-ready drums from minimal training with interpretable human control, demonstrating that structured intelligence beats computational brute force.
Abstract: We present MambaKit, a hybrid neural-deterministic one-shot drum synthesizer that preserves human creative control while achieving high-quality sample generation through cognitive scaffolding, addressing the tension between AI creative capability and human controllability in percussive sound design. Current drum synthesis forces a false choice: accept black-box generation with limited control, or use traditional synthesis requiring years of specialized knowledge. Our system resolves this through structured priors, sine anchors derived from root notes, pitch envelopes, and ADSR parameters automatically extracted during training and exposed as interpretable controls during inference. A structured diffusion framework combines these harmonic anchors with learned noise injection, while the frequency-aware MAMBA2 architecture achieves up to 256x memory efficiency by matching temporal processing windows to signal characteristics, enabling 44.1kHz raw audio synthesis on a single A100 GPU with production-quality output emerging from the first training batch (size of 2), though with occasional instabilities that diminish with minimal additional training. The system preserves human creative agency through interpretable musical parameters while AI handles acoustic complexity. Complete drum arrangements created exclusively using MambaKit samples demonstrate real-world utility. Results suggest sustainable AI creativity emerges from structured human-AI partnerships rather than pure computational scaling.
Submission Number: 96
Loading