Keywords: LMU, LSTM, RNN, NLP, Transformers, Feedforward Training
Abstract: Recently, a new recurrent neural network (RNN) named the Legendre Memory Unit (LMU) was proposed and shown to achieve state-of-the-art performance on psMNIST and other datasets. Here we consider a modified version of the LMU, named ff-LMU, the core of which is a linear time-invariant (LTI) system. We first show that the ff-LMU can be trained in a purely feedforward manner and yet executed during inference in a recurrent fashion. Specifically we demonstrate that it trains about 80x faster than LSTM models of the same size. As a result, it overcomes the well-known limitations of training RNNs on GPUs that make them less scalable than feedforward networks like transformers. Second, to validate its utility, we compare ff-LMU performance against LSTMs on five benchmarks picked from the following categories: sentiment classification, semantic similarity, natural language inference, and image classification. Our models, despite their simplicity, achieve new state-of-the-art results for RNNs on psMNIST and QQP, and exhibit superior performance on the remaining three datasets while using up to 1000x fewer parameters. In general, ff-LMU models are highly parameter efficient. For instance, the first model that beats it on current leaderboards for QQP is a transformer that uses 50,000x more parameters.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Reviewed Version (pdf): https://openreview.net/references/pdf?id=wiIssh9_-t
5 Replies
Loading