Rough Transformers: Lightweight and Continuous Time Series Modelling through Signature Patching

Fernando Moreno-Pino; Alvaro Arroyo; Harrison Waldon; Xiaowen Dong; Alvaro Cartea

Rough Transformers: Lightweight and Continuous Time Series Modelling through Signature Patching

Fernando Moreno-Pino, Alvaro Arroyo, Harrison Waldon, Xiaowen Dong, Alvaro Cartea

Published: 25 Sept 2024, Last Modified: 11 Jan 2025NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: efficient time series modelling, efficient sequence modelling, continuous-time temporal modelling, patching, self-attention, transformers, computational efficiency, spatial processing

TL;DR: We propose a way to make the self-attention mechanism operate on continuous-time representations of temporal signals and incurs significantly lower computational costs.

Abstract: Time-series data in real-world settings typically exhibit long-range dependencies and are observed at non-uniform intervals. In these settings, traditional sequence-based recurrent models struggle. To overcome this, researchers often replace recurrent models with Neural ODE-based architectures to account for irregularly sampled data and use Transformer-based architectures to account for long-range dependencies. Despite the success of these two approaches, both incur very high computational costs for input sequences of even moderate length. To address this challenge, we introduce the Rough Transformer, a variation of the Transformer model that operates on continuous-time representations of input sequences and incurs significantly lower computational costs. In particular, we propose multi-view signature attention, which uses path signatures to augment vanilla attention and to capture both local and global (multi-scale) dependencies in the input data, while remaining robust to changes in the sequence length and sampling frequency and yielding improved spatial processing. We find that, on a variety of time-series-related tasks, Rough Transformers consistently outperform their vanilla attention counterparts while obtaining the representational benefits of Neural ODE-based models, all at a fraction of the computational time and memory resources.

Supplementary Material: zip

Primary Area: Deep learning architectures

Submission Number: 18156

Loading