K-Means for Parallel Architectures Using All-Prefix-Sum Sorting and Updating Steps

Kai J. Kohlhoff, Vijay S. Pande, Russ B. Altman

Published: 2013, Last Modified: 10 Nov 2023IEEE Trans. Parallel Distributed Syst. 2013Readers: Everyone

Abstract: We present an implementation of parallel K-means clustering, called Kps-means, that achieves high performance with near-full occupancy compute kernels without imposing limits on the number of dimensions and data points permitted as input, thus combining flexibility with high degrees of parallelism and efficiency. As a key element to performance improvement, we introduce parallel sorting as data preprocessing and updating steps. Our final implementation for Nvidia GPUs achieves speedups of up to 200-fold over CPU reference code and of up to three orders of magnitude when compared with popular numerical software packages.

0 Replies