Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling

Published: 11 Jun 2025, Last Modified: 10 Jul 2025ES-FoMo IIIEveryoneRevisionsBibTeXCC BY 4.0
Keywords: structural state space model, non-linear RNN, liquid neural network, parallelization, sequence modeling
TL;DR: LrcSSM is a nonlinear RNN with parallel prefix-scan updates, formal gradient stability, and compute-optimal scaling. It outperforms LRU, S5, and Mamba on long-range tasks - matching SSM speed without attention or FFT overhead.
Abstract: We present LrcSSM, a $\textit{nonlinear}$ recurrent model that processes long sequences as fast as today’s linear state-space layers. By forcing the state-transition matrix to be diagonal and learned at every step, the full sequence can be solved in parallel with a single prefix-scan, giving $\mathcal{O}(TD)$ time and memory and only $\mathcal{O}(\log T)$ sequential depth, for input-sequence length $T$ and a state dimension $D$. Moreover, LrcSSM offers a formal gradient-stability guarantee that other input-varying systems such as Liquid-S4 and Mamba do not provide. Lastly, for network depth $L$, as the forward and backward passes cost $\Theta(T\,D\,L)$ FLOPs, with its low sequential depth and parameter count $\Theta(D\,L)$, the model follows the compute-optimal scaling law regime ($\beta \approx 0.42$) recently observed for Mamba, outperforming quadratic-attention Transformers at equal compute while avoiding the memory overhead of FFT-based long convolutions. We show that on a series of long-range forecasting tasks, LrcSSM outperforms LRU, S5 and Mamba.
Submission Number: 153
Loading