Abstract: We construct near-optimal coresets for kernel density estimates for points in $${\mathbb {R}}^d$$ R d when the kernel is positive definite. Specifically we provide a polynomial time construction for a coreset of size $$O(\sqrt{d}/\varepsilon \cdot \sqrt{\log 1/\varepsilon } )$$ O ( d / ε · log 1 / ε ) , and we show a near-matching lower bound of size $$\Omega (\min \{\sqrt{d}/\varepsilon , 1/\varepsilon ^2\})$$ Ω ( min { d / ε , 1 / ε 2 } ) . When $$d\ge 1/\varepsilon ^2$$ d ≥ 1 / ε 2 , it is known that the size of coreset can be $$O(1/\varepsilon ^2)$$ O ( 1 / ε 2 ) . The upper bound is a polynomial-in- $$(1/\varepsilon )$$ ( 1 / ε ) improvement when $$d \in [3,1/\varepsilon ^2)$$ d ∈ [ 3 , 1 / ε 2 ) and the lower bound is the first known lower bound to depend on d for this problem. Moreover, the upper bound restriction that the kernel is positive definite is significant in that it applies to a wide variety of kernels, specifically those most important for machine learning. This includes kernels for information distances and the sinc kernel which can be negative.
0 Replies
Loading