Keywords: selective state space models, Mamba, Volterra equations, continuum limits, interacting particle systems
TL;DR: We show that the many-token limit of Mamba is a causal Volterra equation with an explicit memory kernel, unlike the mean-field attention limits of transformers.
Abstract: Transformers admit continuum descriptions based on mean-field interactions, but selective state space models fall into a different class.
We show that the many-token limit of an input-conditioned, single-head SISO Mamba-3 block is a \emph{causal Volterra equation} on the sphere with an explicit exponential memory kernel. The key reason is that Mamba's causal mask is chain-dependent: the influence of one token on another is transmitted through the full sequence of intermediate gates rather than a single pairwise weight.
Under a constant-horizon scaling, we prove convergence to this limit, validate it numerically, and show that the same framework covers the SISO Mamba-3 rotation, with Mamba-2 as a special case. We also characterize the resulting memory-horizon interpolation and relate the kernel scale to pretrained Mamba-2 heads.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 9
Loading