Clustering using Approximate Nearest Neighbour Oracles

Published: 31 Mar 2023, Last Modified: 31 Mar 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: We study the problem of clustering data points in a streaming setting when one has access to the geometry of the space only via approximate nearest neighbour (ANN) oracles. In this setting, we present algorithms for streaming $O(1)$-approximate $k$-median clustering and its (streaming) coreset construction. In certain domains of interest, such as spaces with constant expansion, our algorithms improve upon the best-known runtime of both these problems. Furthermore, our results extend to cost functions satisfying the approximate triangle inequality, which subsumes $k$-means clustering and $M$-estimators. Finally, we run experiments on Census1990 dataset wherein the results empirically support our theory.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Revised the paper incorporating suggestions of the reviewers and area chair.
Assigned Action Editor: ~Qin_Zhang1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 556