Flowing Datasets with Wasserstein over Wasserstein Gradient Flows

Clément Bonet; Christophe Vauthier; Anna Korba

Flowing Datasets with Wasserstein over Wasserstein Gradient Flows

Clément Bonet, Christophe Vauthier, Anna Korba

Published: 01 May 2025, Last Modified: 13 Aug 2025ICML 2025 oralEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We flow datasets using Wasserstein over Wasserstein gradient flows.

Abstract: Many applications in machine learning involve data represented as probability distributions. The emergence of such data requires radically novel techniques to design tractable gradient flows on probability distributions over this type of (infinite-dimensional) objects. For instance, being able to flow labeled datasets is a core task for applications ranging from domain adaptation to transfer learning or dataset distillation. In this setting, we propose to represent each class by the associated conditional distribution of features, and to model the dataset as a mixture distribution supported on these classes (which are themselves probability distributions), meaning that labeled datasets can be seen as probability distributions over probability distributions. We endow this space with a metric structure from optimal transport, namely the Wasserstein over Wasserstein (WoW) distance, derive a differential structure on this space, and define WoW gradient flows. The latter enables to design dynamics over this space that decrease a given objective functional. We apply our framework to transfer learning and dataset distillation tasks, leveraging our gradient flow construction as well as novel tractable functionals that take the form of Maximum Mean Discrepancies with Sliced-Wasserstein based kernels between probability distributions.

Lay Summary: Many modern machine learning tasks involve data that is best understood not as individual points, but as probability distributions. For example, in labeled datasets, each class (like "cat" or "dog") can be seen as its own distribution of examples, making the entire dataset a collection of distributions, or a “distribution of distributions”. This layered structure is powerful, but traditional learning tools aren’t designed to handle it. We introduce a new mathematical framework that allows to smoothly transform one such dataset into another, by treating them as objects in a geometric space and defining meaningful ways to move through that space. Our approach builds on ideas from optimal transport, a field focused on comparing and moving distributions in the most efficient way. This new perspective enables models to better adapt to new tasks and to compress datasets into smaller sets that still capture the key information needed for learning. More broadly, it offers a foundation for working directly with richer, more structured data in a principled and scalable way.

Link To Code: https://github.com/clbonet/Flowing_Datasets_with_WoW_Gradient_Flows

Primary Area: Probabilistic Methods->Everything Else

Keywords: Wasserstein gradient flows, optimal transport, datasets

Submission Number: 2603

Loading