Learning with Expected Signatures: Theory and Applications

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 oralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: The expected signature maps a collection of data streams to a lower dimensional representation, with a remarkable property: the resulting feature tensor can fully characterize the data generating distribution. This "model-free"' embedding has been successfully leveraged to build multiple domain-agnostic machine learning (ML) algorithms for time series and sequential data. The convergence results proved in this paper bridge the gap between the expected signature's empirical discrete-time estimator and its theoretical continuous-time value, allowing for a more complete probabilistic interpretation of expected signature-based ML methods. Moreover, when the data generating process is a martingale, we suggest a simple modification of the expected signature estimator with significantly lower mean squared error and empirically demonstrate how it can be effectively applied to improve predictive performance.
Lay Summary: Stream data—also known as time series, path data, data sequences, or by countless other names—are a common way to represent quantities that evolve over time. These representations are often long and may include redundant information, which can make them challenging for standard machine learning models to handle effectively. The expected signature offers a way to summarize such data into a lower-dimensional representation (an embedding) that is easier to work with. This transformation does not rely on additional assumptions about how the data were generated and does not require setting extra parameters, making it broadly applicable. In the ideal case of continuous data sampled at infinite frequency, this embedding enjoys strong theoretical guarantees that also cover processes with highly irregular realizations, such as those observed in financial markets or complex physical systems. In practice, however, we only observe finitely many data at discrete time points. This paper provides general conditions under which the estimator we compute from such data reliably converges to the ideal expected signature. We also propose a new estimator that improves on the standard one—both in terms of theoretical guarantees and empirical performance across a wide range of scenarios.
Link To Code: https://github.com/lorenzolucchese/esig
Primary Area: Theory->Probabilistic Methods
Keywords: Probabilistic Machine Learning, Signature, Expected Signature, Time Series, Rough Paths
Submission Number: 13314
Loading