Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention

Arya Honarpisheh; Mustafa Bozdag; Octavia Camps; Mario Sznaier

Generalization Error Analysis for Selective State-Space Models Through the Lens of Attention

Arya Honarpisheh, Mustafa Bozdag, Octavia Camps, Mario Sznaier

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

Keywords: Generalization Bounds, State-Space Models, Selective Scan, Stability, Transformers, Rademacher Complexity, Covering Numbers

TL;DR: The paper derives generalization bounds for selective SSMs using connections to self-attention, showing that spectral properties of the state matrix influence generalization.

Abstract: State-space models (SSMs) have recently emerged as a compelling alternative to Transformers for sequence modeling tasks. This paper presents a theoretical generalization analysis of selective SSMs, the core architectural component behind the Mamba model. We derive a novel covering number-based generalization bound for selective SSMs, building upon recent theoretical advances in the analysis of Transformer models. Using this result, we analyze how the spectral abscissa of the continuous-time state matrix influences the model’s stability during training and its ability to generalize across sequence lengths. We empirically validate our findings on a synthetic majority task, the IMDb sentiment classification benchmark, and the ListOps task, demonstrating how our theoretical insights translate into practical model behavior.

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 13619

Loading