Categorical Schrödinger Bridge Matching

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: This paper extends the Schrödinger Bridge problem to work with discrete time and spaces.
Abstract: The Schrödinger Bridge (SB) is a powerful framework for solving generative modeling tasks such as unpaired domain translation. Most SB-related research focuses on continuous data space $\mathbb{R}^{D}$ and leaves open theoretical and algorithmic questions about applying SB methods to discrete data, e.g, on finite spaces $\mathbb{S}^{D}$. Notable examples of such sets $\mathbb{S}$ are codebooks of vector-quantized (VQ) representations of modern autoencoders, tokens in texts, categories of atoms in molecules, etc. In this paper, we provide a theoretical and algorithmic foundation for solving SB in discrete spaces using the recently introduced Iterative Markovian Fitting (IMF) procedure. Specifically, we theoretically justify the convergence of discrete-time IMF (D-IMF) to SB in discrete spaces. This enables us to develop a practical computational algorithm for SB, which we call Categorical Schrödinger Bridge Matching (CSBM). We show the performance of CSBM via a series of experiments with synthetic data and VQ representations of images. The code of CSBM is available at [this repository](https://github.com/gregkseno/csbm).
Lay Summary: The task of bridging two domains without paired data is a promising direction in generative modeling. One of the approaches we focus on in this work is the Schrödinger Bridge (SB) problem. The SB problem "optimally" couples two data domains by connecting them through a stochastic process, with the notion of optimality varying depending on the choice of reference process. Unfortunately, existing methods for solving the SB problem are primarily designed for continuous data, such as images. This leaves a significant gap in applying SB methods to discrete domains, including text, discrete latent spaces of certain autoencoders, and molecular graphs with discrete nodes and edges. Moreover, the only current discrete SB method, DDSBM, is focused on a continuous-time formulation, which seems to lack some of the desirable properties, such as the theoretical ability to perform generation in a few steps. To address both challenges, we propose Categorical Schrödinger Bridge Matching (CSBM), a method designed to extend the SB framework to discrete space and time settings. The main goal of our work is to prove the existence of the solution in selected settings. We tackle this by leveraging a specific characterization of the SB, which guarantees the uniqueness of the solution. This result allows us to directly apply an established iterative procedure, discrete-time Iterative Markovian Fitting, to find the solution. Moreover, we introduce a couple of reference processes that define optimality under both Euclidean distance and Hamming distance. Through experiments, we demonstrate how different choices of reference processes affect the solution; highlight the theoretically grounded ability of CSBM to operate with a small number of steps; and showcase its applicability to tasks such as high-resolution image translation and text style transfer. The contribution of our work lies not only in addressing the gap in solving the Schrödinger Bridge problem in discrete space and time, but also in introducing key aspects relevant to discrete domains, such as the choice of reference processes (noting that commonly used Brownian motion is unsuitable for discrete spaces) and the theoretically grounded ability to accelerate inference by reducing the number of diffusion steps.
Link To Code: https://github.com/gregkseno/csbm
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Schrödinger Bridge, Entropic Optimal Transport, Optimal transport, Unpaired Learning, Discrete space
Submission Number: 10636
Loading