Global Optimal K-Medoids Clustering of One Million Samples

Jiayang Ren; Kaixun Hua; Yankai Cao

Global Optimal K-Medoids Clustering of One Million Samples

Jiayang Ren, Kaixun Hua, Yankai Cao

Published: 31 Oct 2022, Last Modified: 10 Jan 2023NeurIPS 2022 AcceptReaders: Everyone

Keywords: Large-Scale, Global Optimization, K-Medoids, Clustering, Lagrangian Relaxation, Branch and Bound, Bound Tightening

Abstract: We study the deterministic global optimization of the K-Medoids clustering problem. This work proposes a branch and bound (BB) scheme, in which a tailored Lagrangian relaxation method proposed in the 1970s is used to provide a lower bound at each BB node. The lower bounding method already guarantees the maximum gap at the root node. A closed-form solution to the lower bound can be derived analytically without explicitly solving any optimization problems, and its computation can be easily parallelized. Moreover, with this lower bounding method, finite convergence to the global optimal solution can be guaranteed by branching only on the regions of medoids. We also present several tailored bound tightening techniques to reduce the search space and computational cost. Extensive computational studies on 28 machine learning datasets demonstrate that our algorithm can provide a provable global optimal solution with an optimality gap of 0.1\% within 4 hours on datasets with up to one million samples. Besides, our algorithm can obtain better or equal objective values than the heuristic method. A theoretical proof of global convergence for our algorithm is also presented.

Supplementary Material: pdf

19 Replies

Loading