Time Series Prediction via Similarity Search: Exploring Invariances, Distance Measures and Ensemble Functions

Antonio Rafael Sabino Parmezan, Vinícius M. A. de Souza, Gustavo E. A. P. A. Batista

2022 (modified: 02 Feb 2023)IEEE Access 2022Readers: Everyone

Abstract: The rapid advance of scientific research in data mining has led to the adaptation of conventional pattern extraction methods to the context of time series analysis. The forecasting (or prediction) task has been supported mainly by regression algorithms based on artificial neural networks, support vector machines, and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> -Nearest Neighbors ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> NN). However, some studies provided empirical evidence that similarity-based methods, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> variations of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> NN, constitute a promising approach compared with more complex predictive models from both machine learning and statistics. Although the scientific community has made great strides in increasing the visibility of these easy-to-fit and impressively accurate algorithms, previous work has failed to recognize the right invariances needed for this task. We propose a novel extension of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> NN, namely <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> NN - Time Series Prediction with Invariances ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> NN-TSPI), that differs from the literature by combining techniques to obtain amplitude and offset invariance, complexity invariance, and treatment of trivial matches. Our predictor enables more meaningful matches between reference queries and data subsequences. From a comprehensive evaluation with real-world datasets, we demonstrate that <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> NN-TSPI is a competitive algorithm against two conventional similarity-based approaches and, most importantly, against 11 popular predictors. To assist future research and provide a better understanding of similarity-based method behaviors, we also explore different settings of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> NN-TSPI regarding invariances to distortions in time series, distance measures, complexity-invariant distances, and ensemble functions. Results show that <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${k}$ </tex-math></inline-formula> NN-TSPI stands out for its robustness and stability both concerning the parameter <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> and the accuracy of the projection horizon trends.

0 Replies