Abstract: Detecting drifts in data is essential for machine learning applications, as changes in the statistics of processed data typically has a profound influence on the performance of trained models. Most of the available literature on drift detection employs either supervised methods, that requires true labels during inference time, or unsupervised, that aim for any changes in the data distribution. We propose a novel task-sensitive semi-supervised drift detection framework, which uses label information to train a model, but detects drifts during inference time only when they affect the model performance, without the ground truth label information. It utilizes a constrained low-dimensional embedding representation of the input data. This way, the dimensionality of the input data is reduced, and the learned representation is best suited for the classification task. We propose two change detectors, which are customized for our framework, but any method to detect a change in the statistic of a data stream can be chosen freely. Experimental evaluation on nine benchmarks datasets, with different types of artificially induced drift, demonstrates that the proposed framework can reliably detect drifts. Furthermore, in our studies we empirically demonstrated that the investigated drift detectors combined with the proposed framework, consistently outperform the other state-of-the-art unsupervised drift detection approaches.
0 Replies
Loading