Kernel Quantile Embeddings and Associated Probability Metrics

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
TL;DR: We propose quantiles in RKHSs, and associated distances that are efficient, metrising under weaker conditions than MMD, and can be seen as kernelised generalisation of sliced Wasserstein. We demonstrate good performance in hypothesis testing.
Abstract: Embedding probability distributions into reproducing kernel Hilbert spaces (RKHS) has enabled powerful nonparametric methods such as the maximum mean discrepancy (MMD), a statistical distance with strong theoretical and computational properties. At its core, the MMD relies on kernel mean embeddings to represent distributions as mean functions in RKHS. However, it remains unclear if the mean function is the only meaningful RKHS representation. Inspired by generalised quantiles, we introduce the notion of *kernel quantile embeddings (KQEs)*. We then use KQEs to construct a family of distances that: (i) are probability metrics under weaker kernel conditions than MMD; (ii) recover a kernelised form of the sliced Wasserstein distance; and (iii) can be efficiently estimated with near-linear cost. Through hypothesis testing, we show that these distances offer a competitive alternative to MMD and its fast approximations.
Lay Summary: Many modern techniques compare complex datasets by representing probability distributions in a flexible space of functions, and measuring their difference using the maximum mean discrepancy (MMD). However, this method reduces each distribution to its average representation, which may miss important details or require strict conditions to work reliably. To address this, we draw inspiration from quantiles—values that divide data evenly—and introduce kernel quantile embeddings (KQEs) as a richer way to capture the shape of a distribution. We develop a consistent estimator for these embeddings and use them to define a new family of distance measures between distributions. These distances can tell distributions apart under milder assumptions than those for the MMD, and they also recover a kernel-based version of the sliced Wasserstein distance, linking two influential statistical frameworks. Importantly, our measures can be computed with scalable, near-linear cost, making them practical for large datasets. Through hypothesis testing, we show that these new distances perform competitively against MMD and its fast approximations. By moving beyond simple averages, our work offers a powerful and flexible alternative for comparing probability distributions and opens the door to new research in data analysis and machine learning.
Link To Code: https://github.com/MashaNaslidnyk/kqe
Primary Area: General Machine Learning->Kernel methods
Keywords: kernel methods, probability metrics, quantiles
Submission Number: 6510
Loading