- Abstract: Recurrent Neural Networks (RNNs) applied to long sequences suffer from the well known vanishing and exploding gradients problem. The recently proposed Unitary Evolution Recurrent Neural Network (uRNN) alleviates the exploding gradient problem and can learn very long dependencies, but its nonlinearities make it still affected by the vanishing gradient problem and so learning can break down for extremely long dependencies. We propose a new RNN transition architecture where the hidden state is updated multiplicatively by a time invariant orthogonal transformation followed by an input modulated orthogonal transformation. There are no additive interactions and so our architecture exactly preserves forward hid-den state activation norm and backwards gradient norm for all time steps, and is provably not affected by vanishing or exploding gradients. We propose using the rotation plane parameterization to represent the orthogonal matrices. We validate our model on a simplified memory copy task and see that our model can learn dependencies as long as 5,000 timesteps.
- TL;DR: Recurrent equation for RNNs that uses the composition of two orthogonal transitions, one time invariant and one modulated by input, that doesn't suffer from vanishing or exploding gradients.
- Keywords: Deep learning, Theory
- Conflicts: berkeley.edu, openai.com