Mamba-3: Improved Sequence Modeling using State Space Principles

Mamba-3: Improved Sequence Modeling using State Space Principles

ICLR 2026 Conference Submission13549 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: State Space Models, Mamba, LLMs, Subquadratic Models

TL;DR: Mamba-3, an inference-first SSM that pushes on core SSM principles: improved discretization for better quality, complex dynamics for new capabilities, and MIMO updates for efficient inference.

Abstract: The recent scaling of test-time compute for LLMs has restricted the practical deployment of models to those with strong capabilities that can generate high-quality outputs in an inference-efficient manner. While current Transformer-based models are the standard, their quadratic compute and linear memory bottlenecks have spurred the development of sub-quadratic models with linear-scaling compute with constant memory requirements. However, many recent linear-style models lack certain capabilities or lag behind in quality, and even their linear-time inference is not hardware-efficient. Guided by an inference-first perspective, we introduce three core methodological improvements inspired by the state-space model viewpoint of linear models. We combine a: 1) more expressive recurrence, 2) complex state update rule that enables richer state tracking, and 3) multi-input, multi-output formulation together, resulting in a stronger model that better exploits hardware parallelism during decoding. Together with architectural refinements, our **Mamba-3** model achieves significant gains across retrieval, state-tracking, and downstream language modeling tasks. Our new architecture sets the Pareto-frontier for performance under a fixed inference budget and outperforms strong baselines in a head-to-head comparison.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 13549

Loading