leSAX Index: A Learned SAX Representation Index for Time Series Similarity Search

Published: 2025, Last Modified: 06 Feb 2026ICDE 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Time series similarity search (TSSS) is a fundamental task across various applications, including classification, motif discovery, and anomaly detection. However, existing iSAX-based index methods, while known for their efficiency, often rely on hand-crafted techniques (e.g., PAA and SAX) for z-normalized time series data. However, these techniques do not fully exploit the full representation space and pose challenges to indexing. In this paper, we propose a learned index approach for TSSS. Specifically, we introduce SAXnet, a novel two-stage neural network that generates the learned SAX representation (leSAX representation) for both z-normalized and non-z-normalized time series data. The benefits of SAXnet are threefold: ① full exploitation of latent space, ② preservation of time series shapes and global information for indexing, and ③ elimination of the need for hand-crafted techniques. We then propose leSaxindex, a novel learned SAX representation index, which consists of a leSAX tree and a learned index. The distribution of the leSAX representations in the leSAX tree is adjusted to achieve a near-uniform distribution for index efficiency. Furthermore, we propose a learned index structure that works alongside the leSAX tree, applied recursively in case of large index leaf nodes. We have conducted comprehensive experiments on exact similarity search using our SAXnet and leSAX index on both real and synthetic time series datasets. The results demonstrate that our leSAX method outperforms state-of-the-art methods in efficiency, achieving performance improvements ranging from 3.6× to 17×.
Loading