Keywords: probability, sampling, compression, speculative decoding, multi-draft speculative sampling, large language models
TL;DR: We introduce a technique for coupling probability distributions when several samples are available from one of the distributions, and give applications to multi-draft speculative decoding and distributed lossy compression with side information.
Abstract: We study a relaxation of the problem of coupling probability distributions — a list of samples is generated from one distribution and an *accept* is declared if any one of these samples is identical to the sample generated from the other distribution.
We propose a novel method for generating samples, which extends the Gumbel-max sampling suggested in Daliri et al. (2025) for coupling probability distributions. We also establish a corresponding lower bound on the acceptance probability, which we call the \emph{list matching lemma}.
We next discuss two applications of our setup.
First, we develop a new mechanism for multi-draft speculative sampling that is simple to implement and achieves performance competitive with baselines such as SpecTr and SpecInfer across a range of language tasks.
Our method also guarantees a certain degree of *drafter invariance* with respect to the output tokens which is not supported by existing schemes.
We also provide a theoretical lower bound on the token level acceptance probability.
As our second application, we consider distributed lossy compression with side information in a setting where a source sample is compressed and available to multiple decoders, each with independent side information.
We propose a compression technique that is based on our generalization of Gumbel-max sampling and show that it provides significant gains in experiments involving synthetic Gaussian sources and the MNIST image dataset.
Supplementary Material: zip
Primary Area: Probabilistic methods (e.g., variational inference, causal inference, Gaussian processes)
Submission Number: 18712
Loading