mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

ICLR 2026 Conference Submission19489 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: RNN, sequence processing, model architecture, embedded systems

Abstract: Processing temporal data directly at the sensor source demands models that capture both short- and long-range dynamics under tight memory constraints. While State-of-the-Art (SotA) sequence models such as Transformers excel at these tasks, their quadratic memory scaling with sequence length makes them impractical for edge settings. Recurrent Neural Networks (RNNs) offer constant memory scaling, but train sequentially and slowly, and Temporal Convolutional Networks (TCNs), though efficiently trainable, also scale memory with kernel length. For more memory-efficient sequence modeling, we propose mGRADE (minimally Gated Recurrent Architecture with Delay Embedding), a hybrid-memory system that integrates a temporal convolution with learnable spacings with a gated recurrent component. The convolution with learnable spacings can express a flexible delay embedding that captures rapid temporal variations, while the recurrent component efficiently maintains global context with minimal memory overhead. We theoretically ground and empirically validate our approach on two types of synthetic tasks, demonstrating that mGRADE effectively separates and preserves temporal features across multiple timescales. Furthermore, on the challenging Long-Range Arena (LRA) benchmark, mGRADE reduces the memory footprint by up to a factor of 8, while maintaining competitive performance compared to SotA models.

Primary Area: learning on time series and dynamical systems

Submission Number: 19489

Loading