Feedforward Legendre Memory Unit

Narsimha Reddy Chilkuri; Chris Eliasmith

Feedforward Legendre Memory Unit

Narsimha Reddy Chilkuri, Chris Eliasmith

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: LMU, LSTM, RNN, NLP, Transformers, Feedforward Training

Abstract: Recently, a new recurrent neural network (RNN) named the Legendre Memory Unit (LMU) was proposed and shown to achieve state-of-the-art performance on psMNIST and other datasets. Here we consider a modified version of the LMU, named ff-LMU, the core of which is a linear time-invariant (LTI) system. We first show that the ff-LMU can be trained in a purely feedforward manner and yet executed during inference in a recurrent fashion. Specifically we demonstrate that it trains about 80x faster than LSTM models of the same size. As a result, it overcomes the well-known limitations of training RNNs on GPUs that make them less scalable than feedforward networks like transformers. Second, to validate its utility, we compare ff-LMU performance against LSTMs on five benchmarks picked from the following categories: sentiment classification, semantic similarity, natural language inference, and image classification. Our models, despite their simplicity, achieve new state-of-the-art results for RNNs on psMNIST and QQP, and exhibit superior performance on the remaining three datasets while using up to 1000x fewer parameters. In general, ff-LMU models are highly parameter efficient. For instance, the first model that beats it on current leaderboards for QQP is a transformer that uses 50,000x more parameters.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=wiIssh9_-t

5 Replies

Loading