Long Expressive Memory for Sequence Modeling

T. Konstantin Rusch; Siddhartha Mishra; N. Benjamin Erichson; Michael W. Mahoney

Long Expressive Memory for Sequence Modeling

T. Konstantin Rusch, Siddhartha Mishra, N. Benjamin Erichson, Michael W. Mahoney

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SpotlightReaders: Everyone

Keywords: sequence modeling, long-term dependencies, multiscale ordinary differential equations, dynamical systems

Abstract: We propose a novel method called Long Expressive Memory (LEM) for learning long-term sequential dependencies. LEM is gradient-based, it can efficiently process sequential tasks with very long-term dependencies, and it is sufficiently expressive to be able to learn complicated input-output maps. To derive LEM, we consider a system of multiscale ordinary differential equations, as well as a suitable time-discretization of this system. For LEM, we derive rigorous bounds to show the mitigation of the exploding and vanishing gradients problem, a well-known challenge for gradient-based recurrent sequential learning methods. We also prove that LEM can approximate a large class of dynamical systems to high accuracy. Our empirical results, ranging from image and time-series classification through dynamical systems prediction to speech recognition and language modeling, demonstrate that LEM outperforms state-of-the-art recurrent neural networks, gated recurrent units, and long short-term memory models.

One-sentence Summary: A novel method for sequence modeling based on multiscale ODEs that is provably able to learn very long-term dependencies while being sufficiently expressive to outperform state-of-the-art recurrent sequence models.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/long-expressive-memory-for-sequence-modeling/code)

17 Replies

Loading