Conditional Distribution Compression via the Kernel Conditional Mean Embedding

Dominic Broadbent; Nick Whiteley; Robert F Allison; Tom Lovett

Conditional Distribution Compression via the Kernel Conditional Mean Embedding

Dominic Broadbent, Nick Whiteley, Robert F Allison, Tom Lovett

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: compression, herding, distribution compression, conditional distribution, joint distribution, inducing points, kernel conditional mean embedding, kernel mean embedding, rkhs, reproducing kernel hilbert space

TL;DR: We extend the concept of distribution compression to joint and conditional distributions, introducing algorithms to produce compressed sets that approximate the joint or conditional distribution of a target dataset.

Abstract: Existing distribution compression methods, like Kernel Herding (KH), were originally developed for unlabelled data. However, no existing approach directly compresses the conditional distribution of *labelled* data. To address this gap, we first introduce the *Average Maximum Conditional Mean Discrepancy* (AMCMD), a metric for comparing conditional distributions, and derive a closed form estimator. Next, we make a key observation: in the context of distribution compression, the cost of constructing a compressed set targeting the AMCMD can be reduced from $\mathcal{O}(n^3)$ to $\mathcal{O}(n)$. Leveraging this, we extend KH to propose Average Conditional Kernel Herding (ACKH), a linear-time greedy algorithm for constructing compressed sets that target the AMCMD. To better understand the advantages of *directly* compressing the conditional distribution rather than doing so via the joint distribution, we introduce *Joint Kernel Herding* (JKH), an adaptation of KH designed to compress the joint distribution of labelled data. While herding methods provide a simple and interpretable selection process, they rely on a greedy heuristic. To explore alternative optimisation strategies, we also propose *Joint Kernel Inducing Points* (JKIP) and *Average Conditional Kernel Inducing Points* (ACKIP), which *jointly* optimise the compressed set while maintaining linear complexity. Experiments show that directly preserving conditional distributions with ACKIP outperforms both joint distribution compression and the greedy selection used in ACKH. Moreover, we see that JKIP consistently outperforms JKH.

Supplementary Material: zip

Primary Area: General machine learning (supervised, unsupervised, online, active, etc.)

Submission Number: 12378

Loading