Fully-Scalable Massively Parallel Algorithm for k-center with Outliers

Published: 01 Jan 2025, Last Modified: 16 May 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper, we consider the k-center problem with outliers (the (k, z)-center problem) in the context of Massively Parallel Computation (MPC). Existing MPC algorithms for the (k, z)-center problem typically require Ω(k) local space per machine. While this may be feasible when k is small, these algorithms become impractical for large k, where each machine may lack sufficient space for computation. This motivates the study of fully-scalable algorithms with sublinear local space. We propose the first fully-scalable MPC algorithm for the (k, z)-center problem. The main challenge is to design an MPC algorithm that operates with sublinear local space for finding the inliers close to the optimal clustering centers, and ensuring the approximation loss remains bounded. To address this issue, we propose an iterative sampling-based algorithm with sublinear local space in the data size. A key component of our approach is an outliers-removal algorithm that adjusts the sample size in each iteration to select inliers as clustering centers. However, the number of discarded inliers increases with the iteration of the outliers-removal algorithm, making it difficult to bound. To address this, we propose a self-adaptive method that can automatically adjust sample size to account for different data distributions on each machine, ensuring a lower bound on the sampling success probability. With these techniques, we present an O(log^*n)-approximation MPC algorithm for the (k, z)-center problem in constant-dimensional Euclidean space. The algorithm discards at most (1 + ε)z outliers, completing in O(log log n) computation rounds while using Θ(n^δ) local space per machine.
Loading