Length independent generalization bounds for deep SSM architectures via Rademacher contraction and stability constraints

Dániel Rácz; Mihaly Petreczky; Balint Daroczy

Length independent generalization bounds for deep SSM architectures via Rademacher contraction and stability constraints

Dániel Rácz, Mihaly Petreczky, Balint Daroczy

Published: 21 Oct 2025, Last Modified: 21 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Deep SSM models like S4, S5, and LRU are made of sequential blocks that combine State-Space Model (SSM) layers with neural networks, achieving excellent performance on learning representations of long-range sequences. In this paper we provide a PAC bound on the generalization error of non-selective architectures with stable SSM blocks, that does not depend on the length of the input sequence. Imposing stability of the SSM blocks is a standard practice in the literature, and it is known to help performance. Our results provide a theoretical justification for the use of stable SSM blocks as the proposed PAC bound decreases as the degree of stability of the SSM blocks increases.

Submission Length: Long submission (more than 12 pages of main content)

Supplementary Material: zip

Assigned Action Editor: ~Nadav_Cohen1

Submission Number: 5304

Loading