Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Analyzing and Exploiting NARX Recurrent Neural Networks for Long-Term Dependencies
Nov 03, 2017 (modified: Nov 03, 2017)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:Recurrent neural networks (RNNs) have achieved state-of-the-art performance on many diverse tasks, from machine translation to surgical activity recognition, yet training RNNs to capture long-term dependencies remains difficult. To date, the vast majority of successful RNN architectures alleviate this problem using nearly-additive connections between states, as introduced by long short-term memory (LSTM). We take an orthogonal approach and introduce MIST RNNs, a NARX RNN architecture that allows direct connections from the very distant past. We show that MIST RNNs 1) exhibit superior vanishing-gradient properties in comparison to LSTM and previously-proposed NARX RNNs; 2) are far more efficient than previously-proposed NARX RNN architectures, requiring even fewer computations than LSTM; and 3) improve performance substantially over LSTM and Clockwork RNNs on tasks requiring very long-term dependencies.
TL;DR:We introduce MIST RNNs, which a) exhibit superior vanishing-gradient properties in comparison to LSTM; b) improve performance substantially over LSTM and Clockwork RNNs on tasks requiring very long-term dependencies; and c) are much more efficient than previously-proposed NARX RNNs, with even fewer parameters and operations than LSTM.
Keywords:recurrent neural networks, long-term dependencies, long short-term memory, LSTM
Enter your feedback below and we'll get back to you as soon as possible.