SeqFusion: Scalable Long-Context Reasoning through Parallel Fragment Fusion and Memory-Augmented Attention

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: efficient reasoning; long-context reasoning
Abstract: Large Language Models (LLMs) often exhibit superior performance on short-context inference ($\leq$2k tokens) compared to long-context reasoning ($\geq$32k tokens), a phenomenon we term the \emph{fragmentation gap}. This gap stems from training-inference mismatch and cumulative attention drift in long sequences. We propose \textbf{SeqFusion}, a novel framework that bridges fragmented short-context inference with unified long-context reasoning through consistency alignment and memory linking. SeqFusion achieves \textbf{2-3x speedup} in long-context processing while maintaining or improving accuracy through its innovative fragmented inference approach. Our method introduces fragment-to-long alignment loss and cross-fragment memory anchors, enabling models to leverage the accuracy benefits of short-context inference while maintaining global consistency. Extensive experiments on LongBench, BookSum, and Passkey Retrieval demonstrate that SeqFusion significantly reduces tail degradation (TDS improvement of 0.15-0.25) and increases fragment-long consistency (FLC improvement of 0.2-0.35) while achieving \textbf{40-60\% memory reduction} and \textbf{2.5-3.5x throughput improvement} compared to traditional long-context approaches.
Submission Number: 62
Loading