Flash-SD-KDE: Accelerating SD-KDE with Tensor Cores
Keywords: Kernel Density Estimation, Score Debiasing, Tensor Cores, SD-KDE, GPU Acceleration
TL;DR: By exploiting GPU Tensor Cores, we make score-debiased kernel density estimation practical at large scale
Abstract: Score-debiased kernel density estimation (SD-KDE) achieves improved asymptotic convergence rates over classical KDE, but its use of an empirical score has made it significantly slower in practice.
We show that by re-ordering the SD-KDE computation to expose matrix-multiplication structure, Tensor Cores can be used to accelerate the GPU implementation.
On a 32k-sample 16-dimensional problem, our approach runs up to $47\times$ faster than a strong SD-KDE GPU baseline and $3{,}300\times$ faster than scikit-learn's KDE.
On a larger 1m-sample 16-dimensional task evaluated on 131k queries, Flash-SD-KDE completes in $2.3$ s on a single GPU, making score-debiased density estimation practical at previously infeasible scales.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 203
Loading