# Kernel Thinning
This Python package implements **kernel thinning**, a tool for compressing distributions while maintaining 
better-than-Monte Carlo integration error uniformly across a reproducing kernel Hilbert space.

More details on kernel thinning can be found in the manuscript [Kernel Thinning](https://arxiv.org/pdf/2105.05842.pdf).

```
@article{dwivedi2021kernel,
  title={Kernel Thinning},
  author={Raaz Dwivedi and Lester Mackey},
  journal={arXiv preprint arXiv:2105.05842},
  year={2021}
}
```

## Installation
To install the `kernelthinning` package, use the following pip command:
```
pip install kernelthinning
```

## Getting started
The primary kernel thinning function is `thin` in the `kt` module:
```python
from kernelthinning import kt
coreset = kt.thin(X, m, split_kernel, swap_kernel, delta=0.5, seed=123, store_K=False)
    """Returns kernel thinning coreset of size floor(n/2^m) as row indices into X
    
    Args:
      X: Input sequence of sample points with shape (n, d)
      m: Number of halving rounds
      split_kernel: Kernel function used by KT-SPLIT (typically a square-root kernel, krt);
        split_kernel(y,X) returns array of kernel evaluations between y and each row of X
      swap_kernel: Kernel function used by KT-SWAP (typically the target kernel, k);
        swap_kernel(y,X) returns array of kernel evaluations between y and each row of X
      delta: Run KT-SPLIT with constant failure probabilities delta_i = delta/n
      seed: Random seed to set prior to generation; if None, no seed will be set
      store_K: If False, runs O(nd) space version which does not store kernel
        matrix; if True, stores n x n kernel matrix
    """
```
For example uses, please refer to the notebook `run_kt_experiment.ipynb`.

## Reproducing paper vignettes

1. The script `submit_jobs_run_kt.py` reproduces the vignette experiments of [Kernel Thinning](https://arxiv.org/pdf/2105.05842.pdf) on a Slurm cluster
by executing `run_kt_experiment.ipynb` with appropriate parameters. For the MCMC examples, it assumes that necessary data was downloaded and pre-processed following the steps listed in `PreProcess_MCMC_Data.ipynb`.
2. After all results have been generated, the notebook `plot_results.ipynb` can be used to reproduce the figures of [Kernel Thinning](https://arxiv.org/pdf/2105.05842.pdf).
