A Kernel Distribution Closeness Testing

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: hypothesis testing, Maximum Mean Discrepancy, distribution closeness testing, two samples testing
Abstract: The \emph{distribution closeness testing} (DCT) assesses whether the distance between an unknown distribution pair is at least $\epsilon$-far; in practice, the $\epsilon$ can be defined as the distance between a reference (known) distribution pair. However, existing DCT methods are mainly measure discrepancies between a distribution pair defined on discrete one-dimensional spaces (e.g., total variation on a discrete one-dimensional space), which limits the DCT to be used on complex data (e.g., images). To make DCT applicable on complex data, a natural idea is to introduce the \emph{maximum mean discrepancy} (MMD), a powerful measurement to see the difference between a pair of two complex distributions, to DCT scenarios. Nonetheless, in this paper, we find that MMD value is less informative \textcolor{blue}{when assessing the closeness levels for multiple distribution pairs with the same kernel, i.e., MMD value can be the same for many pairs of distributions that have different norms in the same \emph{reproducing kernel Hilbert space} (RKHS). To mitigate the issue, we propose a new kernel DCT with the \emph{norm-adaptive MMD} (NAMMD) by scaling MMD with the norms of distributions, effective for kernels $\kappa(\x,\x')=\Psi(\x-\x')\leq K$ with a positive-definite $\Psi(\cdot)$ and $\Psi(\bm{0})=K$.} Theoretically, we prove that our NAMMD test achieves higher test power compared to the MMD test, along with asymptotic distribution analysis. We also present upper bounds on the sample complexity of our NAMMD test and prove that Type-I error is controlled. We finally conduct experiments to validate the effectiveness of our NAMMD test.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6378
Loading