Overflow Prevention Enhances Long-Context Recurrent LLMs

Assaf Ben-Kish; Itamar Zimerman; Muhammad Jehanzeb Mirza; Lior Wolf; James R. Glass; Leonid Karlinsky; Raja Giryes

Overflow Prevention Enhances Long-Context Recurrent LLMs

Assaf Ben-Kish, Itamar Zimerman, Muhammad Jehanzeb Mirza, Lior Wolf, James R. Glass, Leonid Karlinsky, Raja Giryes

Published: 08 Jul 2025, Last Modified: 26 Aug 2025COLM 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mamba, Sub-Quadratic Models, Long Context, Long-Range Language Modeling, RNNs

TL;DR: We identify that recurrent LLMs suffer from recurrent memory overflows that limit their performance in long-context tasks. We propose OPRM, a training-free overflow-prevention mechanism that achieves significant gains in many long-context tasks.

Abstract: A recent trend in LLMs is developing recurrent sub-quadratic models that improve long-context processing efficiency. We investigate leading large long-context models, focusing on how their fixed-size recurrent memory affects their performance. Our experiments reveal that, even when these models are trained for extended contexts, their use of long contexts remains underutilized. Specifically, we demonstrate that a chunk-based inference procedure, which identifies and processes only the most relevant portion of the input can mitigate recurrent memory failures and be effective for many long-context tasks: On LongBench, our method improves the overall performance of Falcon3-Mamba-Inst-7B by 14%, Falcon-Mamba-Inst-7B by 28%, RecurrentGemma-IT-9B by 50%, and RWKV6-Finch-7B by 51%. Surprisingly, this simple approach also leads to state-of-the-art results in the challenging LongBench v2 benchmark, showing competitive performance with equivalent size Transformers. Furthermore, our findings raise questions about whether recurrent models genuinely exploit long-range dependencies across multiple chunks, since our single-chunk strategy delivers stronger performance - even in tasks that presumably require cross-segment relations. We will release our code.

Supplementary Material: zip

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 25

Loading