Faster Kernel Density Estimation via Hashing Based Time–Space Tradeoffs

Faster Kernel Density Estimation via Hashing Based Time–Space Tradeoffs

ICLR 2026 Conference Submission18197 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Kernel Density Estimation, Approximate Near Neighbour Search

TL;DR: We design data structures for Kernel Density Estimation with improved query time as well as allowing space and query time tradeoffs.

Abstract: In this paper we study the Kernel Density Estimation (KDE) problem: Given a dataset $\mathcal{P}$ of $n$ points in Euclidean space and a kernel $K(p,q)$, prepare a low space data-structure that given a query $q$ can quickly output a $1\pm \epsilon$ approximation to $\mu=(\sum_{p\in \mathcal{P}}K(p,q))/n$. Recent advances have used tools from Locality Sensitive Hashing (LSH) and Approximate Nearest Neighbor (ANN) search to build KDE data-structures with query time sublinear in $1/\mu$ and space linear in $1/\mu$, with Charikar et al. (2020) achieving the current best query time of $\approx 1/\mu^{0.173}$ for the popular Gaussian kernel. Our main result is a data-structure with significantly improved query time $\approx 1/\mu^{0.05}$ , at the expense of somewhat higher space complexity of $\approx 1/\mu^{4.15}$. More generally, our techniques give the first known query time vs space tradeoffs for KDE: for any $\delta\ge0$ we can design a KDE data-structure with space with $1/\mu^{1+\delta}$ dependence and query time with $1/\mu^{\xi(\delta)}$ dependence, where $\xi(\delta)$ is a non-increasing function of $\delta$. Importantly for the linear space regime, i.e $\delta=0$, we obtain a query time of $1/\mu^{0.1865}$, improving the non-adaptive KDE bound from Charikar et al. (2020) and nearly matching the bound of Charikar et al. (2020) with a significantly simpler analysis.

Supplementary Material: zip

Primary Area: learning theory

Submission Number: 18197

Loading