Uncovering the Computational Roles of Nonlinearity in Sequence Modeling

TMLR Paper5933 Authors

19 Sept 2025 (modified: 20 Nov 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Sequence modeling tasks across domains such as natural language processing, time-series forecasting, speech recognition, and control require complex computations. While nonlinear recurrence is required for universal sequence approximation, linear models have often proven surprisingly effective in practice, raising the question of when nonlinearity is truly required. In this study, we systematically dissect the functional role of nonlinearity in recurrent networks—identifying both when it is computationally necessary, and what mechanisms it enables. We use Almost Linear Recurrent Neural Networks (AL-RNNs), which allow flexible control over the type and degree of nonlinearity, as a probe into the internal mechanisms of sequence models. We evaluate AL-RNNs across a diverse set of synthetic and real-world tasks, including classic sequence modeling benchmarks, an empirical neuroscientific stimulus-selection task, and a multi-task suite. We demonstrate how the AL-RNN's piecewise linear structure enables direct identification of computational primitives such as gating, rule-based integration, and memory-dependent transients, revealing that these operations emerge within predominantly linear dynamical backbones. Across tasks, sparse nonlinearity plays several functional roles: it improves interpretability by reducing and localizing nonlinear computations, promotes shared (rather than highly distributed) representations in multi-task settings, and reduces computational cost by limiting nonlinear operations. Moreover, sparse nonlinearity acts as a useful inductive bias: in low-data regimes, or when tasks require discrete switching between linear regimes, sparsely nonlinear models often match or exceed the performance of fully nonlinear architectures. Our framework bridges dynamical systems theory with the functional demands of long-range memory and structured computation in recurrent neural networks, with implications for both artificial and biological neural systems.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Changes are highlighted in yellow throughout the text.
Assigned Action Editor: ~William_T_Redman1
Submission Number: 5933
Loading