Keywords: Distribution compression, linear time, thinning, i.i.d. sampling, Markov chain Monte Carlo, maximum mean discrepancy, reproducing kernel Hilbert space
TL;DR: We introduce a simple algorithm for compressing an $n$ -point summary of a probability distribution into a $\sqrt{n}$ -point summary of comparable quality in $O ( n \log^2 n )$ time
Abstract: In distribution compression, one aims to accurately summarize a probability distribution $P$ using a small number of representative points. Near-optimal thinning procedures achieve this goal by sampling $n$ points from a Markov chain and identifying $\sqrt{n}$ points with $\tilde{O}(\frac{1}{\sqrt{n}})$ distributional discrepancy to $P$. Unfortunately, these same algorithms suffer from quadratic or super-quadratic runtime in the sample size $n$. To address this deficiency, we introduce a simple meta-procedure---Compress++---for speeding up any input thinning algorithm while suffering at most a factor of four in error. When combined with the quadratic-time kernel halving and kernel thinning algorithms of Dwivedi and Mackey (2021), Compress++ delivers $\sqrt{n}$ points with $O( \sqrt{ \frac{\log n}{n } } )$ integration error and better-than-Monte-Carlo maximum mean discrepancy in $O( n \log^3 n )$ time and $O( \sqrt{n} \log^2 n )$ space. Moreover, Compress++ enjoys the same near-linear runtime given any quadratic-time input and reduces the runtime of super-quadratic algorithms by a square-root factor. In our benchmarks with high-dimensional Monte Carlo samples and long-running Markov chains targeting challenging differential equation posteriors, Compress++ matches or nearly matches the accuracy of its input algorithm in orders of magnitude less time.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/distribution-compression-in-near-linear-time/code)
1 Reply
Loading