Internal Evaluation of Density-Based Clusterings with Noise

Published: 26 Jan 2026, Last Modified: 28 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Evaluation, Clustering, Unsupervised Learning
TL;DR: We introduce the first internal evaluation measure that actively evaluates the quality of noise labels and density-based clustering results.
Abstract: Evaluating the quality of a clustering result without access to ground truth labels is fundamental for research in data mining. However, most cluster validation indices (CVIs) do not consider the noise assignments by density-based clustering methods like DBSCAN or HDBSCAN, even though the ability to correctly determine noise is paramount to successful clustering. In this paper, we propose DISCO, a **D**ensity-based **I**nternal **S**core for **C**lusterings with n**O**ise, the first CVI to explicitly assess the *quality* of noise assignments rather than merely counting them. DISCO is based on the Silhouette Coefficient, but adopts density-connectivity to evaluate clusters of arbitrary shapes, and proposes explicit noise evaluation: it rewards correctly assigned noise labels and penalizes noise labels where a cluster label would have been more appropriate. The pointwise definition of DISCO allows for the seamless integration of noise evaluation into the final clustering evaluation, while also enabling explainable evaluations of the clustered data. In contrast to most state-of-the-art methods, DISCO is well-defined and also covers edge cases that regularly appear as output from clustering algorithms, such as singleton clusters or a single cluster plus noise.
Primary Area: datasets and benchmarks
Submission Number: 7202
Loading