Keywords: Kernel Density Estimation, Approximate Near Neighbour Search
TL;DR: We design data structures for Kernel Density Estimation with improved query time as well as allowing space and query time tradeoffs.
Abstract: In this paper we study the Kernel Density Estimation (KDE) problem: Given a dataset $\mathcal{P}$ of $n$ points in Euclidean space and a kernel $K(p,q)$, prepare a low space data-structure that given a query $q$ can quickly output a $1\pm \epsilon$ approximation to $\mu=(\sum_{p\in \mathcal{P}}K(p,q))/n$. Recent advances have used tools from Locality Sensitive Hashing (LSH) and Approximate Nearest Neighbor (ANN) search to build KDE data-structures with query time sublinear in $1/\mu$ and space linear in $1/\mu$, with Charikar et al. (2020) achieving the current best query time of $\approx 1/\mu^{0.173}$ for the popular Gaussian kernel.
Our main result is a data-structure with significantly improved query time $\approx 1/\mu^{0.05}$ , at the expense of somewhat higher space complexity of $\approx 1/\mu^{4.15}$. More generally, our techniques give the first known query time vs space tradeoffs for KDE: for any $\delta\ge0$ we can design a KDE data-structure with space with $1/\mu^{1+\delta}$ dependence and query time with $1/\mu^{\xi(\delta)}$ dependence, where $\xi(\delta)$ is a non-increasing function of $\delta$. Importantly for the linear space regime, i.e $\delta=0$, we obtain a query time of $1/\mu^{0.1865}$, improving the non-adaptive KDE bound from Charikar et al. (2020) and nearly matching the bound of Charikar et al. (2020) with a significantly simpler analysis.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 18197
Loading