Keywords: Generalization Bounds, State-Space Models, Selective Scan, Stability, Transformers, Rademacher Complexity, Covering Numbers
TL;DR: The paper derives generalization bounds for selective SSMs using connections to self-attention, showing that spectral properties of the state matrix influence generalization.
Abstract: State-space models (SSMs) have recently emerged as a compelling alternative to Transformers for sequence modeling tasks. This paper presents a theoretical generalization analysis of selective SSMs, the core architectural component behind the Mamba model. We derive a novel covering number-based generalization bound for selective SSMs, building upon recent theoretical advances in the analysis of Transformer models. Using this result, we analyze how the spectral abscissa of the continuous-time state matrix influences the model’s stability during training and its ability to generalize across sequence lengths. We empirically validate our findings on a synthetic majority task, the IMDb sentiment classification benchmark, and the ListOps task, demonstrating how our theoretical insights translate into practical model behavior.
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 13619
Loading