QMambaExtend: Improving Long-Context Extension of Memory-Efficient Mamba Models

Published: 05 Mar 2025, Last Modified: 14 Apr 2025SCOPE - ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Main paper track (up to 5 pages excluding references and appendix)
Keywords: Mamba, Long Context Generalization, Discretization Step, SSM
Abstract:

Despite its impressive sub-quadratic compute efficiency, Mamba's effectiveness is significantly limited by its pre-training context length, resulting in a pronounced degradation when the model is tasked with handling longer contexts. This may be attributed to the out-of-distribution (OOD) discretization steps of Mamba on longer contexts. To address this critical limitation, we introduce MambaExtend, a novel framework designed to enhance the context extension capabilities of Mamba. Specifically, MambaExtend leverages a training-free approach to calibrate only the scaling factors of discretization modules for different layers. To further enhance the model efficiency while improving their long context understanding, we benchmark the performance of quantized model variants, namely QMambaExtend. With this, for the first time, we can enable a training-free context extension of up to 32x from 2k to 64k, that too requiring up to 2.1x reduced weight memory footprint. The codes is available here\footnote{\href{https://github.com/ArminAzizi98/LongContextMamba}{https://github.com/ArminAzizi98/LongContextMamba}}.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 87
Loading