Matrix Profile XXXI: Motif Discovery Made Faster

Maryam Shahcheraghi; Chin-Chia Michael Yeh; Yan Zheng; Junpeng Wang; Zhongfang Zhuang; Liang Wang; Mahashweta Das; Eamonn J. Keogh

Matrix Profile XXXI: Motif Discovery Made Faster

Maryam Shahcheraghi, Chin-Chia Michael Yeh, Yan Zheng, Junpeng Wang, Zhongfang Zhuang, Liang Wang, Mahashweta Das, Eamonn J. Keogh

Published: 01 Jan 2024, Last Modified: 15 May 2025ICKG 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Approximately repeated subsequences in a longer time series, i.e., time series motifs, are important primitive in time series data mining. Motifs are used in dozens of downstream tasks, including classification, clustering, summarization, rule discovery, segmentation etc. Time series motif discovery is a notoriously computationally expensive task. Some motif discovery algorithms are fast in the best case, but in other datasets, even if both the data and motif lengths are held the same, both their time and space complexity can explode. The Matrix Profile has the nice property that its time and space complexity are independent of the data. Moreover, the Matrix Profile is fast enough for datasets with about one million datapoints, which covers a large fraction of user cases. However, there are situations where we may wish to consider datasets which are much larger. In this work, we introduce the first lower bound for the Matrix Profile and an algorithm that exploits that lower bound to allow orders of magnitude speed up for exact motif search on real-world datasets. We demonstrate the utility of our ideas with the largest and most ambitious motif discovery experiments ever attempted.

Loading