Bandit Optimal Transport

Published: 17 Jul 2025, Last Modified: 06 Sept 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bandits; Optimal Transport; Non-parametric estimation;
TL;DR: Are infinite-dimensional optimal transport problems bandit-learnable? Yes, and optimal transport's regularity properties make it surprisingly easy.
Abstract: Despite the impressive progress in statistical Optimal Transport (OT) in recent years, there has been little interest in the study of the \emph{sequential learning} of OT. Surprisingly so, as this problem is both practically motivated and a challenging extension of existing settings such as linear bandits. This article considers (for the first time) the stochastic bandit problem of learning to solve generic Kantorovich OT problems from repeated interactions when the marginals are known but the cost is unknown. By exploiting the intrinsic regularity of the OT problem, we show that this problem satisfies classical Hilbert space bandit regret guarantees ($\tilde{\mathcal O}(\sqrt{T})$ multiplied by log-determinant terms) for both problems. To deal with learning in infinite dimension, we provide a functional regression method which can exploit intrinsic regularity of the cost to obtain complete regret bounds interpolating between $\tilde{\mathcal O}(\sqrt{T})$ (finite and parametric cases) and ${\mathcal O}(T)$ (unlearnable costs).
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~Lorenzo_Croissant1
Track: Regular Track: unpublished work
Submission Number: 3
Loading