Sublinear Time Algorithms for Greedy Selection in High DimensionsDownload PDF

Published: 20 May 2022, Last Modified: 05 May 2023UAI 2022 PosterReaders: Everyone
Keywords: greedy selection, k-center, convex hull approximation, sublinear time
TL;DR: We propose a sublinear time framework for greedy selection in high dimensions
Abstract: Greedy selection is a widely used idea for solving many machine learning problems. But greedy selection algorithms often have high complexities and thus may be prohibitive for large-scale data. In this paper, we consider two fundamental optimization problems in machine learning: -center clustering and convex hull approximation, where they both can be solved via greedy selection. We propose sublinear time algorithms for them through combining the strategies of randomization and greedy selection. Our results are similar in spirit to the linear time stochastic greedy selection algorithms for submodular maximization [Mirzasoleiman et al., AAAI 2015, Hassidim and Singer, ICML 2017], but with several important differences. Our runtimes are independent of the number of input data items . In particular, our runtime for -center clustering significantly improves upon that of the uniform sampling approach [Huang et al, FOCS 2018], especially when the dimensionality is high. Moreover, our algorithms are particularly suitable for the scenario that we cannot directly access the whole input data (due to the reasons like privacy preserving, data storage and transmission) and can only take a small sample via an oracle each time. Our sublinear algorithms yield the improvement on the efficiency for various applications, such as data selection and compression, active learning, topic modeling, {\em etc}.
Supplementary Material: zip
4 Replies