Length independent generalization bounds for deep SSM architectures via Rademacher contraction and stability constraints

Published: 21 Oct 2025, Last Modified: 21 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep SSM models like S4, S5, and LRU are made of sequential blocks that combine State-Space Model (SSM) layers with neural networks, achieving excellent performance on learning representations of long-range sequences. In this paper we provide a PAC bound on the generalization error of non-selective architectures with stable SSM blocks, that does not depend on the length of the input sequence. Imposing stability of the SSM blocks is a standard practice in the literature, and it is known to help performance. Our results provide a theoretical justification for the use of stable SSM blocks as the proposed PAC bound decreases as the degree of stability of the SSM blocks increases.
Submission Length: Long submission (more than 12 pages of main content)
Supplementary Material: zip
Assigned Action Editor: ~Nadav_Cohen1
Submission Number: 5304
Loading