Identifying nonlinear dynamical systems with multiple time scales and long-range dependencies

Dominik Schmidt; Georgia Koppe; Zahra Monfared; Max Beutelspacher; Daniel Durstewitz

Identifying nonlinear dynamical systems with multiple time scales and long-range dependencies

Dominik Schmidt, Georgia Koppe, Zahra Monfared, Max Beutelspacher, Daniel Durstewitz

Published: 12 Jan 2021, Last Modified: 05 May 2023ICLR 2021 SpotlightReaders: Everyone

Keywords: nonlinear dynamical systems, recurrent neural networks, attractors, computational neuroscience, vanishing gradient problem, LSTM

Abstract: A main theoretical interest in biology and physics is to identify the nonlinear dynamical system (DS) that generated observed time series. Recurrent Neural Networks (RNN) are, in principle, powerful enough to approximate any underlying DS, but in their vanilla form suffer from the exploding vs. vanishing gradients problem. Previous attempts to alleviate this problem resulted either in more complicated, mathematically less tractable RNN architectures, or strongly limited the dynamical expressiveness of the RNN. Here we address this issue by suggesting a simple regularization scheme for vanilla RNN with ReLU activation which enables them to solve long-range dependency problems and express slow time scales, while retaining a simple mathematical structure which makes their DS properties partly analytically accessible. We prove two theorems that establish a tight connection between the regularized RNN dynamics and their gradients, illustrate on DS benchmarks that our regularization approach strongly eases the reconstruction of DS which harbor widely differing time scales, and show that our method is also en par with other long-range architectures like LSTMs on several tasks.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

One-sentence Summary: We introduce a novel regularization for ReLU-based vanilla RNN that mitigates the exploding vs. vanishing gradient problem while retaining a simple mathematical structure that makes the RNN's dynamical systems properties partly analytically tractable

16 Replies

Loading