The seeding algorithm for spherical k-means clustering with penalties

Sai Ji; Dachuan Xu; Longkun Guo; Min Li; Dongmei Zhang

The seeding algorithm for spherical k-means clustering with penalties

Sai Ji, Dachuan Xu, Longkun Guo, Min Li, Dongmei Zhang

Published: 01 Jan 2022, Last Modified: 23 Jan 2025J. Comb. Optim. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Spherical k-means clustering as a known NP-hard variant of the k-means problem has broad applications in data mining. In contrast to k-means, it aims to partition a collection of given data distributed on a spherical surface into k sets so as to minimize the within-cluster sum of cosine dissimilarity. In the paper, we introduce spherical k-means clustering with penalties and give a \(2\max \{2,M\}(1+M)(\ln k+2)\)-approximation algorithm. Moreover, we prove that when against spherical k-means clustering with penalties but on separable instances, our algorithm is with an approximation ratio \(2\max \{3,M+1\}\) with high probability, where M is the ratio of the maximal and the minimal penalty cost of the given data set.

Loading