Capacitated Online Clustering Algorithm

Published: 01 Jan 2024, Last Modified: 25 Jan 2025ECAI 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Clustering is a widely used unsupervised learning tool with applications in numerous real-world problems. Traditional clustering methods can result in highly skewed clusters where one cluster is notably larger than others, rendering them unsuitable for scenarios such as logistics and routing. In response, capacitated clustering approaches have emerged over the past decade. These approaches limit the number of data points each cluster can accommodate, thus resulting in more uniform cluster formations. In an online version of capacitated clustering, the algorithm must make an irrevocable decision for each incoming data point, determining whether to establish it as a new center or allocate it to existing centers. The goal is to minimize the count of opened centers while adhering to capacity constraints and achieving a satisfactory approximation of the clustering cost compared to the optimal solution. Although exploring online capacitated clustering remains uncharted, we are the first to propose a probabilistic Capacitated Online Clustering Algorithm (called COCA) for h-dimensional euclidean spaces. We theoretically bound the number of centers opened and provide constant cost approximation guarantees. Additionally, we conduct rigorous experiments to validate the computational efficacy of the proposed approaches.
Loading