everyone
since 05 Mar 2025">EveryoneRevisionsBibTeXCC BY 4.0
Despite its impressive sub-quadratic compute efficiency, Mamba's effectiveness is significantly limited by its pre-training context length, resulting in a pronounced degradation when the model is tasked with handling longer contexts. This may be attributed to the out-of-distribution (OOD) discretization steps of Mamba on longer contexts. To address this critical limitation, we introduce MambaExtend, a novel framework designed to enhance the context extension capabilities of Mamba. Specifically, MambaExtend leverages a training-free approach to calibrate only the scaling factors of discretization modules for different layers. To further enhance the model efficiency while improving their long context understanding, we benchmark the performance of quantized model variants, namely QMambaExtend. With this, for the first time, we can enable a training-free context extension of up to 32x from 2k to 64k, that too requiring up to 2.1x reduced weight memory footprint. The codes is available here\footnote{\href{https://github.com/ArminAzizi98/LongContextMamba}{https://github.com/ArminAzizi98/LongContextMamba}}.