Multiplicative LSTM for sequence modelling

Ben Krause; Iain Murray; Steve Renals; Liang Lu

Multiplicative LSTM for sequence modelling

Ben Krause, Iain Murray, Steve Renals, Liang Lu

23 Aug 2025 (modified: 22 Jun 2025)ICLR 2017 Invite to WorkshopReaders: Everyone

TL;DR: Combines LSTM and multiplicative RNN architectures; achieves 1.19 bits/character on Hutter prize dataset with dynamic evaluation.

Abstract: We introduce multiplicative LSTM (mLSTM), a novel recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network architectures. mLSTM is characterised by its ability to have different recurrent transition functions for each possible input, which we argue makes it more expressive for autoregressive density estimation. We demonstrate empirically that mLSTM outperforms standard LSTM and its deep variants for a range of character level modelling tasks, and that this improvement increases with the complexity of the task. This model achieves a test error of 1.19 bits/character on the last 4 million characters of the Hutter prize dataset when combined with dynamic evaluation.

Keywords: Deep learning, Unsupervised Learning, Natural language processing

Conflicts: ed.ac.uk, ttic.edu

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/multiplicative-lstm-for-sequence-modelling/code)

0 Replies

Loading