Mini-batch kernel $k$-means

Ben Jourdan; Gregory Schwartzman

Mini-batch kernel $k$-means

Ben Jourdan, Gregory Schwartzman

14 May 2024 (modified: 06 Nov 2024)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: kernel k-means, k-means, mini-batch, clustering

TL;DR: We introduce mini-batch kernel k-means and provide a theoretical analysis with an early stopping condition.

Abstract:

We present the first mini-batch kernel $k$-means algorithm. Our algorithm achieves an order of magnitude improvement in running time compared to the full batch algorithm, with only a minor negative effect on the quality of the solution. Specifically, a single iteration of our algorithm requires only $O(n(k+b))$ time, compared to $O(n^2)$ for the full batch kernel $k$-means, where $n$ is the size of the dataset and $b$ is the batch size.

We provide a theoretical analysis for our algorithm with an early stopping condition and show that if the batch is of size $\Omega((\gamma / \epsilon)^2\log (n\gamma/\epsilon))$, the algorithm must terminate within $O(\gamma^2/\epsilon)$ iterations with high probability, where $\gamma$ is the bound on the norm of points in the dataset in feature space, and $\epsilon$ is a threshold parameter for termination. Our results hold for any reasonable initialization of centers. When the algorithm is initialized with the $k$-means++ initialization scheme, it achieves an approximation ratio of $O(\log k)$.

Many popular kernels are normalized (e.g., Gaussian, Laplacian), which implies $\gamma=1$. For these kernels, taking $\epsilon$ to be a constant and $b=\Theta(\log n)$, our algorithm terminates within $O(1)$ iterations where each iteration takes time $O(n(\log n+k))$.

Supplementary Material: zip

Primary Area: Other (please use sparingly, only use the keyword field for more details)

Submission Number: 9564

Loading