Keywords: State Space Models, Memory Decay, Long-Range Dependencies, Jordan Normal Form, Spectral Theory, Mamba, S4, Sequence Modeling, Forecasting, Dynamical Systems, Expressivity, Control Theory
TL;DR: Stable SSMs do not forget purely exponentially. We prove memory decays as $\Theta(k^{m-1}\rho^k)$, where Jordan block structure creates long transient memory, improving recall but hurting counting-based reasoning tasks.
Abstract: Linear State Space Models (SSMs) such as S4 and Mamba are widely believed to possess exponentially decaying memory whenever the transition matrix is stable. We show that this characterization is incomplete. By analyzing the Jordan normal form of the transition operator, we prove that memory retention in linear SSMs follows the sharp asymptotic law $\Theta(k^{m-1}\rho^k)$, where $\rho$ is the spectral radius and $m$ is the size of the dominant Jordan block. This reveals that defective, non-diagonalizable dynamics induce a polynomial transient phase that can dramatically extend the effective context horizon beyond predictions based solely on spectral radius. We further derive matching upper and lower bounds for both hidden-state and input-output memory, establish a controllability--observability characterization through Gramians, and generalize the analysis to time-varying gated architectures including Mamba-style selective SSMs. Empirically, we uncover a fundamental tradeoff: defective transition matrices substantially improve associative recall but degrade formal language counting tasks that require independent state tracking. We additionally show that LayerNorm suppresses transient numerical explosion while preserving the memory extension effect. Our results provide a sharp constructive theory of memory in SSMs and yield concrete architectural principles for long-horizon sequence modeling and forecasting.
Submission Number: 153
Loading