The Surprising Soupability of Documents in State Space Models

The Surprising Soupability of Documents in State Space Models

ICLR 2026 Conference Submission19975 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: State Space Models, Question-answering, Long-context Reading Comprehension

TL;DR: We introduce a distributed document processing framework, which merges independently computed document hidden states from fine-tuned Mamba models, enabling efficient inference across corpora.

Abstract: We investigate whether hidden states from Structured State Space Models (SSMs) can be merged post hoc to support downstream reasoning. Inspired by model souping, we propose a strategy where documents are encoded independently and their representations are pooled, via simple operations like averaging, into a single context state. This approach, which we call document souping, enables modular encoding and reuse without reprocessing the full input for each query. We demonstrate that finetuned Mamba2 models with souped representations achieve competitive or superior performance across multi-hop QA, sparse retrieval, and long-document reasoning tasks compared to the standard monolithic encoding approach. For example, on the RACE and QuALITY benchmarks for long document question answering, our method substantially outperforms a traditional concatenation approach. Crucially, this modular design scales to hundreds of documents---we test up to 256---while delivering substantial savings in inference cost, unlocking new possibilities for large-scale corpus reasoning.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 19975

Loading