Closeness testing from distributed measurements

Published: 18 Dec 2025, Last Modified: 21 Feb 2026ALT 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: distribution testing, distributed inference, communication constraints, two-sample testing, closeness testing
Abstract: We consider the fundamental task of two-sample composite hypothesis testing (i.e., "closeness testing") in a distributed setting, where a central party holds a dataset of $M \gg 1$ observations from an unknown discrete probability distribution $q$ over a universe of size $k$, and individual parties each independently observes one realization from an unknown distribution $p$. The goal is for the central party to test whether $p$ and $q$ are equal, or differ significantly in statistical distance, while only receiving a small amount of information (at most $\ell \leq \log_2 k$ bits) from each of the $n$ distributed entities. Our main contribution is a time- and sample-efficient algorithm for this task, applicable across the whole regime of parameters. Our theoretical guarantees match the optimal sample complexities in the specific cases already studied in the literature, e.g., when $\ell = \log_2 k$ (no information constraint) or $M \to \infty$ (reference distribution fully known to the central party).
Submission Number: 25
Loading