On Structured State-Space Duality

Jerry Yao-Chieh Hu; Xiwen Zhang; Weimin Wu; Han Liu

On Structured State-Space Duality

Jerry Yao-Chieh Hu, Xiwen Zhang, Weimin Wu, Han Liu

10 Sept 2025 (modified: 03 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: mamba, mamba2, structured state-space model, transformer, attention

Abstract: Structured {S}tate-{S}pace {D}uality (SSD) [Dao \& Gu, ICML 2024] is an equivalence between a simple Structured {S}tate-{S}pace {M}odel (SSM) and a masked attention mechanism. In particular, a state-space model with a scalar-times-identity state matrix is equivalent to a masked self-attention with a $1$-semiseparable causal mask. Consequently, the same sequence transformation (model) has two algorithmic realizations: a linear-time $O(T)$ recurrence or as a quadratic-time $O(T^2)$ attention. In this work, we formalize and generalize this duality: (i) we extend SSD from the scalar‑identity case to general diagonal SSMs (diagonal state matrices); (ii) we show that these diagonal SSMs match the scalar case's training complexity lower bounds while supporting richer dynamics; (iii) we establish a necessary and sufficient condition under which an SSM is equivalent to $1$-semiseparable masked attention; and (iv) we provide a negative result that such duality is impossible to extend to standard softmax attention due to rank explosion. Together, these results strengthen the theoretical bridge between recurrent SSMs and Transformers, and widen the design space for expressive yet efficient models.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 3814

Loading