Extreme Value k-means Clustering

Sixiao Zheng; Yanxi Hou; Yanwei Fu; Jianfeng Feng

Extreme Value k-means Clustering

Sixiao Zheng, Yanxi Hou, Yanwei Fu, Jianfeng Feng

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: This paper introduces Extreme Value Theory into k-means to measure similarity and proposes a novel algorithm called Extreme Value k-means for clustering.

Abstract: Clustering is the central task in unsupervised learning and data mining. k-means is one of the most widely used clustering algorithms. Unfortunately, it is generally non-trivial to extend k-means to cluster data points beyond Gaussian distribution, particularly, the clusters with non-convex shapes (Beliakov & King, 2006). To this end, we, for the first time, introduce Extreme Value Theory (EVT) to improve the clustering ability of k-means. Particularly, the Euclidean space was transformed into a novel probability space denoted as extreme value space by EVT. We thus propose a novel algorithm called Extreme Value k-means (EV k-means), including GEV k-means and GPD k-means. In addition, we also introduce the tricks to accelerate Euclidean distance computation in improving the computational efficiency of classical k-means. Furthermore, our EV k-means is extended to an online version, i.e., online Extreme Value k-means, in utilizing the Mini Batch k-means to cluster streaming data. Extensive experiments are conducted to validate our EV k-means and online EV k-means on synthetic datasets and real datasets. Experimental results show that our algorithms significantly outperform competitors in most cases.

Keywords: unsupervised learning, clustering, k-means, Extreme Value Theory

Original Pdf: pdf

5 Replies

Loading