Abstract: Approximately repeated subsequences in a longer time series, i.e., time series motifs, are important primitive in time series data mining. Motifs are used in dozens of downstream tasks, including classification, clustering, summarization, rule discovery, segmentation etc. Time series motif discovery is a notoriously computationally expensive task. Some motif discovery algorithms are fast in the best case, but in other datasets, even if both the data and motif lengths are held the same, both their time and space complexity can explode. The Matrix Profile has the nice property that its time and space complexity are independent of the data. Moreover, the Matrix Profile is fast enough for datasets with about one million datapoints, which covers a large fraction of user cases. However, there are situations where we may wish to consider datasets which are much larger. In this work, we introduce the first lower bound for the Matrix Profile and an algorithm that exploits that lower bound to allow orders of magnitude speed up for exact motif search on real-world datasets. We demonstrate the utility of our ideas with the largest and most ambitious motif discovery experiments ever attempted.
Loading