Keywords: Deep Learning, Sequence Architecture, Recurrent Neural Network, State Space Model, Linear RNN, SSM, Mamba, Linear Attention, RWKV, DeltaNet, DeltaProduct, Looped Architecture, Recurrent Depth, Equilibrium Model, DEQ, Implicit Neural Network, Implicit Model, Neural ODE, Adaptive Computation Time, ACT, Test-Time Computation, Test-Time Compute, Reasoning, State-Tracking, A5, S5, Copying, CatbAbi
TL;DR: We introduce the Fixed-Point RNN framework to solve state-tracking tasks by parameterizing the state transition matrix as implicitly dense.
Abstract: Linear recurrent neural networks (RNNs) and state-space models (SSMs) such as Mamba have become promising alternatives to softmax-attention as sequence mixing layers in Transformer architectures. Current models, however, do not exhibit the full state-tracking expressivity of RNNs because they rely on channel-wise (i.e. diagonal) sequence mixing. 
In this paper, we investigate parameterizations of a large class of dense linear RNNs as fixed-points of parallelizable diagonal linear RNNs.
The resulting models can naturally trade expressivity for efficiency at a fixed number of parameters
and achieve state-of-the-art results on the state-tracking benchmarks $A_5$ and $S_5$, while matching performance on copying and other tasks.
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 27826
Loading