Semi-supervised Density-Based Clustering

Levi Lelis, Jörg Sander

Published: 2009, Last Modified: 15 May 2025ICDM 2009EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Most of the effort in the semi-supervised clustering literature was devoted to variations of the K-means algorithm. In this paper we show how background knowledge can be used to bias a partitional density-based clustering algorithm. Our work describes how labeled objects can be used to help the algorithm detecting suitable density parameters for the algorithm to extract density-based clusters in specific parts of the feature space. Considering the set of constraints estabilished by the labeled dataset we show that our algorithm, called SSDBSCAN, automatically finds density parameters for each natural cluster in a dataset. Four of the most interesting characteristics of SSDBSCAN are that (1) it only requires a single, robust input parameter, (2) it does not need any user intervention, (3) it automatically finds the noise objects according to the density of the natural clusters and (4) it is able to find the natural cluster structure even when the density among clusters vary widely. The algorithm presented in this paper is evaluated with artificial and real-world datasets, demonstrating better results when compared to other unsupervised and semi-supervised density-based approaches.