Quantum (Inspired)  $D^2$-sampling with Applications

Poojan Chetan Shah; Ragesh Jaiswal

Quantum (Inspired) $D^2$-sampling with Applications

Poojan Chetan Shah, Ragesh Jaiswal

Published: 22 Jan 2025, Last Modified: 23 May 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: quantum, clustering, k-means++

Abstract: $D^2$-sampling is a fundamental component of sampling-based clustering algorithms such as $k$-means++. Given a dataset $V \subset \mathbb{R}^d$ with $N$ points and a center set $C \subset \mathbb{R}^d$, $D^2$-sampling refers to picking a point from $V$ where the sampling probability of a point is proportional to its squared distance from the nearest center in $C$. The popular $k$-means++ algorithm is simply a $k$-round $D^2$-sampling process, which runs in $O(Nkd)$ time and gives $O(\log{k})$-approximation in expectation for the $k$-means problem. In this work, we give a quantum algorithm for (approximate) $D^2$-sampling in the QRAM model that results in a quantum implementation of $k$-means++ with a running time $\tilde{O}(\zeta^2 k^2)$. Here $\zeta$ is the aspect ratio ( i.e., largest to smallest interpoint distance) and $\tilde{O}$ hides polylogarithmic factors in $N, d, k$. It can be shown through a robust approximation analysis of $k$-means++ that the quantum version preserves its $O(\log{k})$ approximation guarantee. Further, we show that our quantum algorithm for $D^2$-sampling can be dequantized using the sample-query access model of Tang (PhD Thesis, Ewin Tang, University of Washington, 2023). This results in a fast quantum-inspired classical implementation of $k$-means++, which we call QI-$k$-means++, with a running time $O(Nd) + \tilde{O}(\zeta^2k^2d)$, where the $O(Nd)$ term is for setting up the sample-query access data structure. Experimental investigations show promising results for QI-$k$-means++ on large datasets with bounded aspect ratio. Finally, we use our quantum $D^2$-sampling with the known $ D^2$-sampling-based classical approximation scheme to obtain the first quantum approximation scheme for the $k$-means problem with polylogarithmic running time dependence on $N$.

Supplementary Material: zip

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11362

Loading