# Supplementary Material for "List-Level Distribution Coupling with Applications to Speculative Decoding and Lossy Compression"

## Appendix

The appendix is in `appendix.pdf`, and is intended to follow on directly from the main paper.

## Code

To replicate the Python environment used for the experiments, first install the appropriate version of PyTorch for your system from https://pytorch.org/get-started/locally/.
Then, go to the `code` folder and run
```
pip install requirements.txt
```
We provide code for the three experiments used in the paper:
1. **Multi-draft speculative decoding.** Executing `run_all.sh` in the `code/SpeculativeDecoding` directory will run all the tests used in the paper.
To run the script, you will need to do `chmod +x run_all.sh` to make it executable.
Before doing so, please download the model weights and associated files for Qwen2.5-0.5B and Qwen2.5-7B.
This can be done by executing the following commands in the `code/SpeculativeDecoding/model-weights` directory:
    - `git clone https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct`
    - `git clone https://huggingface.co/Qwen/Qwen2.5-7B-Instruct`
2. **Compression experiments on a synthetic Gaussian source.** Run `python3 test_gaussian.py` in the `code/GaussianSource` directory. We run this script 10 times to obtain our results.
The results are not deterministic, so we also provide the actual data that we collected (see the section below).
3. **Distributed image compression on MNIST.** To train the models, run `python3 train_vae.py` in the `code/ImageCompression` directory.
This will populate the `model` folder with trained models.
Then, run `python3 mnist_experiment.py` to use the models in the compression experiments. We run this script 5 times to get the results in the paper.
The results are not deterministic, so we also provide the actual data that we collected (see the section below).

## Data

We provide the data generated by our experiments in the `data` directory.
In `SpeculativeDecoding`, results on i.i.d. drafts are found in `std` and those with diverse drafts are found in `tmp`.
For each dataset, we provide 5 sets of results, each using a different random seed.
These are in CSV format.
Each filename includes the model used, the type of test (`std` or `tmp2`), a timestamp and the dataset name.
In `GaussianSource`, we provide results over 10 runs of our code, split into those for our scheme and those for the baseline.
In `ImageCompression`, you can find results for 5 runs of our experiment.

To process the data and obtain the numbers reported in the paper, we provide a MATLAB script in each folder. These are titled `process_data_wz.m`, `process_data_mnist.m` and `process_data_sps.m`. The scripts can be run to calculate the mean and standard error numbers that we use to construct our tables.