Bi-Mamba: Towards Accurate 1-Bit State Space Model

TMLR Paper4185 Authors

11 Feb 2025 (modified: 21 Apr 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The typical selective state-space model (SSM) of Mamba addresses several limitations of Transformers, such as quadratic computational complexity with sequence length and significant inference-time memory requirements due to the key-value cache. However, the growing size of Mamba models continues to pose training and deployment challenges and raises environmental concerns due to considerable energy consumption. In this work, we introduce Bi-Mamba, a scalable and powerful 1-bit Mamba architecture designed for more efficient large language models with multiple sizes across 780M, 1.3B, and 2.7B. Bi-Mamba models are trained from scratch on data volume as regular LLM pertaining using an autoregressive distillation loss. Extensive experimental results on language modeling demonstrate that Bi-Mamba achieves performance comparable to its full-precision counterparts (e.g., FP16 or BF16) and much better accuracy than post-training-binarization (PTB) Mamba and binarization-aware training (BAT) Transformer baselines, while significantly reducing memory footprint and energy consumption compared to the original Mamba model. Our study pioneers a new linear computational complexity LLM framework under low-bit representation and facilitates future design of specialized hardware tailored for efficient 1-bit Mamba-based LLMs. Our code is provided in supplementary material and the pre-trained weights are available anonymously at https://drive.google.com/drive/folders/1jfk_TlDzFbER84ITvU2hOX2VyPC9H4MA?usp=sharing.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Binhang_Yuan1
Submission Number: 4185
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview