LinC: Explaining Time Series Clusterings with User-Provided Constraints

Simiao Lin; Aras Yurtman; Jonas Soenen; Hendrik Blockeel

LinC: Explaining Time Series Clusterings with User-Provided Constraints

Simiao Lin, Aras Yurtman, Jonas Soenen, Hendrik Blockeel

Published: 01 Jan 2023, Last Modified: 25 Jan 2025PKDD/ECML Workshops (3) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Semi-supervised clustering leverages expert feedback to improve the clustering quality. Explainability is important in this process: understanding how the system has incorporated the feedback helps the expert evaluate the quality of the obtained clustering and build trust in the clustering process. COBRAS is a state-of-the-art active constraint-based semi-supervised clustering method that queries the user for pairwise constraints between selected instances. However, for complex data such as time series, COBRAS lacks a method to explain the clustering neatly and a mechanism to assist the user in deciding when to stop the clustering process. In this study, we propose LinC, a method that connects two given instances with a chain based on the constraints provided by the user and the distance measure used in COBRAS. This provides an explanation for why two instances are in the same cluster or in different clusters. The user might decide to stop the clustering process when there is a large fraction of correct and reliable chains, indicating a high-quality clustering. We demonstrate LinC for univariate and multivariate time series in this paper. LinC can also be easily adapted to other types of visualization-friendly instances such as images.

Loading