Keywords: clustering; coreset; Wasserstein distance
Abstract: The classical metric $k$-center problem is widely used in data representation tasks. However, real-world datasets often contain noise and exhibit complex structures, making the traditional metric $k$-center problem insufficient for such scenarios. To address these challenges, we present the \textbf{R}obust \textbf{W}asserstein \textbf{C}enter clustering (RWC-clustering)  problem.
Compared to the classical setting, the main challenge in designing an algorithm for the RWC-clustering problem lies in effectively handling noise in the cluster centers. To this end, we introduce a dedicated purification step to eliminate noise, based on which we develop our clustering algorithm.
Furthermore, when dealing with large-scale datasets, both storage and computation become highly resource-intensive. To alleviate this, we adopt the \textit{coreset} technique to improve the computational and storage efficiency by compressing the dataset.  
Roughly speaking, this coreset method enables us to calculate the objective value on a small-size coreset, while ensuring a close approximation to the value on the original dataset in theory; thus, it substantially saves the storage and computation resources.  
Finally, experimental results show the effectiveness of our RWC-clustering  problem and the efficiency of the coreset method.
Supplementary Material:  zip
Primary Area: Optimization (e.g., convex and non-convex, stochastic, robust)
Submission Number: 10746
Loading