We present the first mini-batch kernel $k$-means algorithm. Our algorithm achieves an order of magnitude improvement in running time compared to the full batch algorithm, with only a minor negative effect on the quality of the solution. Specifically, a single iteration of our algorithm requires only $O(n(k+b))$ time, compared to $O(n^2)$ for the full batch kernel $k$-means, where $n$ is the size of the dataset and $b$ is the batch size.
We provide a theoretical analysis for our algorithm with an early stopping condition and show that if the batch is of size $\Omega((\gamma / \epsilon)^2\log (n\gamma/\epsilon))$, the algorithm must terminate within $O(\gamma^2/\epsilon)$ iterations with high probability, where $\gamma$ is the bound on the norm of points in the dataset in feature space, and $\epsilon$ is a threshold parameter for termination. Our results hold for any reasonable initialization of centers. When the algorithm is initialized with the $k$-means++ initialization scheme, it achieves an approximation ratio of $O(\log k)$.
Many popular kernels are normalized (e.g., Gaussian, Laplacian), which implies $\gamma=1$. For these kernels, taking $\epsilon$ to be a constant and $b=\Theta(\log n)$, our algorithm terminates within $O(1)$ iterations where each iteration takes time $O(n(\log n+k))$.