Abstract: Spherical k-means clustering as a known NP-hard variant of the k-means problem has broad applications in data mining. In contrast to k-means, it aims to partition a collection of given data distributed on a spherical surface into k sets so as to minimize the within-cluster sum of cosine dissimilarity. In the paper, we introduce spherical k-means clustering with penalties and give a \(2\max \{2,M\}(1+M)(\ln k+2)\)-approximation algorithm. Moreover, we prove that when against spherical k-means clustering with penalties but on separable instances, our algorithm is with an approximation ratio \(2\max \{3,M+1\}\) with high probability, where M is the ratio of the maximal and the minimal penalty cost of the given data set.
Loading