Abstract: Semi-supervised clustering leverages expert feedback to improve the clustering quality. Explainability is important in this process: understanding how the system has incorporated the feedback helps the expert evaluate the quality of the obtained clustering and build trust in the clustering process. COBRAS is a state-of-the-art active constraint-based semi-supervised clustering method that queries the user for pairwise constraints between selected instances. However, for complex data such as time series, COBRAS lacks a method to explain the clustering neatly and a mechanism to assist the user in deciding when to stop the clustering process. In this study, we propose LinC, a method that connects two given instances with a chain based on the constraints provided by the user and the distance measure used in COBRAS. This provides an explanation for why two instances are in the same cluster or in different clusters. The user might decide to stop the clustering process when there is a large fraction of correct and reliable chains, indicating a high-quality clustering. We demonstrate LinC for univariate and multivariate time series in this paper. LinC can also be easily adapted to other types of visualization-friendly instances such as images.
Loading