Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients

Mete Kemertas; Allan Douglas Jepson; Amir-massoud Farahmand

Efficient and Accurate Optimal Transport with Mirror Descent and Conjugate Gradients

Mete Kemertas, Allan Douglas Jepson, Amir-massoud Farahmand

Published: 03 Jun 2025, Last Modified: 03 Jun 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We propose Mirror Descent Optimal Transport (MDOT), a novel method for solving discrete optimal transport (OT) problems with high precision, by unifying temperature annealing in entropic-regularized OT (EOT) with mirror descent techniques. In this framework, temperature annealing produces a sequence of EOT dual problems, whose solution gradually gets closer to the solution of the original OT problem. We solve each problem efficiently using a GPU-parallel nonlinear conjugate gradients algorithm (PNCG) that outperforms traditional Sinkhorn iterations under weak regularization. Moreover, our investigation also reveals that the theoretical convergence rate of Sinkhorn iterations can exceed existing non-asymptotic bounds when its stopping criterion is tuned in a manner analogous to MDOT. Our comprehensive ablation studies of MDOT-PNCG affirm its robustness across a wide range of algorithmic parameters. Benchmarking on 24 problem sets of size $n=4096$ in a GPU environment demonstrate that our method attains high-precision, feasible solutions significantly faster than a representative set of existing OT solvers—including accelerated gradient methods and advanced Sinkhorn variants—in both wall-clock time and number of operations. Empirical convergence rates range between $O(n^2 \varepsilon^{-1/4})$ and $O(n^2 \varepsilon^{-1})$, where $\varepsilon$ is the optimality gap. For problem sizes up to $n=16\,384$, the empirical runtime scales as $\widetilde{O}(n^2)$ for moderate precision and as $\widetilde{O}(n^{5/2})$ at worst for high precision. These findings establish MDOT-PNCG as a compelling alternative to current OT solvers, particularly in challenging weak-regularization regimes.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Added the following: - vertical dashed lines for the CPU-based solver in Figure 5. - dashed reference lines for the $O(n^{2.5})$ scaling for visual comparison to Figure 6. - CPU make and model to the end of the first paragraph of Section 5 to contextualize CPU-based solver environment. - discussion of the CPU-based solver to Section 5.3 (last 4 lines). - Appendix E, which compares in newly added Table 1 the proposed algorithm to the CPU-based network simplex approach in higher dimensions ($n=16384$) than in the main text ($n=4096$) to compare the scaling behavior. - pointers to sections of the appendix in the beginning of the Appendix (for easy access like a ToC).

Supplementary Material: zip

Assigned Action Editor: ~Rémi_Flamary1

Submission Number: 4230

Loading