- Keywords: Recurrent Neural Networks, Nonlinear State Space Models, Generative Models, Long short-term memory, vanishing/exploding gradient problem, Nonlinear dynamics, Interpretable machine learning, Time series analysis
- TL;DR: We develop a new optimization approach for vanilla ReLU-based RNN that enables long short-term memory and identification of arbitrary nonlinear dynamical systems with widely differing time scales.
- Abstract: Vanilla RNN with ReLU activation have a simple structure that is amenable to systematic dynamical systems analysis and interpretation, but they suffer from the exploding vs. vanishing gradients problem. Recent attempts to retain this simplicity while alleviating the gradient problem are based on proper initialization schemes or orthogonality/unitary constraints on the RNN’s recurrency matrix, which, however, comes with limitations to its expressive power with regards to dynamical systems phenomena like chaos or multi-stability. Here, we instead suggest a regularization scheme that pushes part of the RNN’s latent subspace toward a line attractor configuration that enables long short-term memory and arbitrarily slow time scales. We show that our approach excels on a number of benchmarks like the sequential MNIST or multiplication problems, and enables reconstruction of dynamical systems which harbor widely different time scales.
- Code: https://www.dropbox.com/s/3iye6cox1ipl5vs/iclr_code.tar.gz?dl=1