Length independent generalization bounds for deep SSM architectures via Rademacher contraction and stability constraints

TMLR Paper5304 Authors

05 Jul 2025 (modified: 10 Jul 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep SSM models like S4, S5, and LRU are made of sequential blocks that combine State-Space Model (SSM) layers with neural networks, achieving excellent performance on long-range sequences. In this paper we provide a PAC bound that holds for non-selective architectures with stable SSM blocks and does not depend on the length of the input sequence. Imposing stability of the SSM blocks is a standard practice in the literature, and it is known to help performance. Our results provide a theoretical justification for the use of stable SSM blocks as the proposed PAC bound decreases as the degree of stability of the SSM blocks increases.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Nadav_Cohen1
Submission Number: 5304
Loading