QMambaExtend: Improving Long-Context Extension of Memory-Efficient Mamba Models

Seyedarmin Azizi; Souvik Kundu; Mohammad Erfan Sadeghi; Massoud Pedram

QMambaExtend: Improving Long-Context Extension of Memory-Efficient Mamba Models

Seyedarmin Azizi, Souvik Kundu, Mohammad Erfan Sadeghi, Massoud Pedram

Published: 05 Mar 2025, Last Modified: 14 Apr 2025SCOPE - ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Main paper track (up to 5 pages excluding references and appendix)

Keywords: Mamba, Long Context Generalization, Discretization Step, SSM

Abstract: Despite its impressive sub-quadratic compute efficiency, Mamba's effectiveness is significantly limited by its pre-training context length, resulting in a pronounced degradation when the model is tasked with handling longer contexts. This may be attributed to the out-of-distribution (OOD) discretization steps of Mamba on longer contexts. To address this critical limitation, we introduce _**MambaExtend**_, a novel framework designed to enhance the context extension capabilities of Mamba. Specifically, MambaExtend leverages a _**training-free**_ approach to calibrate _only_ the scaling factors of discretization modules for different layers. To further enhance the model efficiency while improving their long context understanding, we benchmark the performance of quantized model variants, namely _QMambaExtend_. With this, for the first time, we can enable a training-free context extension of up to 32x from 2k to 64k, that too requiring up to 2.1x reduced weight memory footprint. The codes is available here\footnote{\href{https://github.com/ArminAzizi98/LongContextMamba}{https://github.com/ArminAzizi98/LongContextMamba}}.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 87

Loading