Nonparametric Bootstrap Likelihood Estimation to Investigate the Chance Set-Up on Clustering Results

Ammar Elnour, Wencheng Yang, Yan Li

Published: 2025, Last Modified: 20 May 2025IEEE Open J. Comput. Soc. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Clustering algorithms are widely used in the knowledge discovery domain, but concerns and questions about the validity of the results must be considered. The datasets commonly used for clustering tasks are often large and scale-free, making conventional statistical techniques inadequate for analyzing result uncertainty. This issue applies to most outcomes obtained from other knowledge discovery techniques, such as machine learning and statistical learning. Traditional statistical methods assume data follows standard distributions, whereas resampling and bootstrapping methods offer more accurate and reliable alternatives. This article introduces a method that employs bootstrap likelihood estimation to infer the uncertainty of generated clustering structures. We first calculated the clustering error in the original dataset and then utilized the proposed method to estimate its nonparametric bootstrapped likelihood. By comparing these two values, we can establish a nonparametric significance testing framework that directly determines the validity of the result. To evaluate the effectiveness of our method, we conducted experiments using synthetic and real datasets. The results demonstrate that our method can successfully validate clustering results.