Learning to Match Unpaired Data with Minimum Entropy Coupling

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Minimum entropy coupling to pair continuous distributions
Abstract: Multimodal data is a precious asset enabling a variety of downstream tasks in machine learning. However, real-world data collected across different modalities is often not paired, which is a significant challenge to learn a joint distribution. A prominent approach to address the modality coupling problem is Minimum Entropy Coupling (MEC), which seeks to minimize the joint Entropy, while satisfying constraints on the marginals. Existing approaches to the MEC problem focus on finite, discrete distributions, limiting their application for cases involving continuous data. In this work, we propose a novel method to solve the continuous MEC problem, using well-known generative diffusion models that learn to approximate and minimize the joint Entropy through a cooperative scheme, while satisfying a relaxed version of the marginal constraints. We empirically demonstrate that our method, DDMEC, is general and can be easily used to address challenging tasks, including unsupervised single-cell multi-omics data alignment and unpaired image translation, outperforming specialized methods.
Lay Summary: In the real world, we often gather data from different sources, such as images and text or various biological measurements, but these sources do not always align directly. For example, we might have many images and many captions, but not know which caption corresponds to which image. This makes it difficult for AI systems to learn how the different types of data relate to each other. This research addresses that problem using a method called Minimum Entropy Coupling (MEC), which aims to find the most organized way to connect two datasets while preserving the unique characteristics of each. However, previous versions of MEC only worked with simpler, discrete data. We introduce a new method called DDMEC, which uses a type of generative model known as diffusion models to connect more complex, continuous types of data, such as images or biological signals, even when they are unpaired. The method is both flexible and versatile. We evaluated DDMEC on challenging tasks such as matching different types of biological data and translating images from one domain to another, and it outperformed specialized tools designed specifically for those tasks.
Link To Code: https://github.com/MustaphaBounoua/ddmec
Primary Area: General Machine Learning->Unsupervised and Semi-supervised Learning
Keywords: Minimum entropy coupling, Unsupervised learning, Diffusion models
Submission Number: 6868
Loading