Keywords: RNNs, Recurrent Neural Networks, SSMs, State Space Models, Expressivity, Formal Languages, Automata Theory, Algebraic Automata Theory, Metric Spaces, Topology, Mamba, Linear Recurrent Neural Networks, Cascades, Krohn and Rhodes
TL;DR: We present a unified theory for the study of RNN expressivity, with novel results on several popular architectures, and insights on the relationship between linear and non-linear RNNs.
Abstract: We propose Metric Automata Theory, an elegant generalisation of classic Automata Theory to continuous dynamical systems, that constitutes a unifying theory of all kinds of Recurrent Neural Networks (RNNs), including widely-adopted architectures such as xLSTM and State Space Models (SSMs). The theory allows one to analyse RNNs both in the finite and unbounded precision settings seamlessly, while utilising fundamental results of Automata Theory. It also provides a novel notion of robustness that guarantees numerical stability and contributes to stability of learning. We employ the theory to prove a comprehensive set of expressivity results for widely-adopted RNNs, with a focus on robustness and finite-precision. Notably, we contrast the capabilities of xLSTM and SSMs for robustly modelling all star-free regular languages—xLSTM can do so, while SSMs cannot robustly recognize the FLIP-FLOP language. Thus we give a novel perspective on the importance of non-linear recurrences, giving insight for why xLSTM shows superior performance to SSMs on several tasks. We provide an improved understanding of the capabilities of Mamba, a popular SSM model. We show that Mamba is not generally capable of recognising the star-free languages under finite-precision, which is seemingly in contrast with the existing theoretical and empirical results for SSMs. We clarify the picture, by showing that Mamba admits a piecewise-linearly separable state space that allows it to approximate star-free languages, with some
length-generalisation abilities. At the same time, Mamba does not admit such state spaces for languages like Parity. This explains why empirically Mamba performs well on star-free languages, and fails on Parity.
Supplementary Material: zip
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 23929
Loading