TSDDISCOVER: Discovering Data Dependency for Time Series Data

Published: 01 Jan 2024, Last Modified: 06 Aug 2024ICDE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Intelligent devices often produce time series data that suffer from significant data quality issues. While the utilization of data dependency in error detection and data repair has been somewhat beneficial, it remains inadequate in accurately representing the data quality of time series datasets. In recognition of the obvious characteristics inherent in time series data, we introduce a novel data dependency, termed TSDD. It effectively captures the contextual relationships embedded within multivariate time series, thereby enhancing the semantic richness of data quality representations. We analyze the complexity of both implication and consistency problems for TSDD reasoning, and develop TSDD discovery algorithm TSDDISCOVER, which consists of functional structure discovery, allowable error bound determination, and validation of TSDD patterns. Experimental results on real-life datasets verify TSDDISCOVER efficiently discovers high-quality TSDD patterns. In comparing the performance of TSDD-based error detection with several leading data quality constraints, our findings reveal that the former achieves an average improvement of 12% in accuracy and 30% in the F1 score over other dependency-based detection methods.
Loading