On the Gradient Formula for learning Generative Models with Regularized Optimal Transport Costs
Abstract: Learning a Wasserstein Generative Adversarial Networks (WGAN) requires the differentiation of the optimal transport cost with respect to the parameters of the generative model. In this work, we provide sufficient conditions for the existence of a gradient formula in two different frameworks: the case of semi-discrete optimal transport (i.e. with a discrete target distribution) and the case of regularized optimal transport (i.e. with an entropic penalty). In both cases the gradient formula involves a solution of the semi-dual formulation of the optimal transport cost. Our study makes a connection between the gradient of the WGAN loss function and the Laguerre diagrams associated to semi-discrete transport maps. The learning problem is addressed with an alternating algorithm, which is in general not convergent. However, in most cases, it stabilizes close to a relevant solution for the generative learning problem. We also show that entropic regularization can improve the convergence speed but noticeably changes the shape of the learned generative model.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We would like to thank the reviewers and the editor for their valuable suggestions and comments. Regarding the main concern about the paper size, we have followed the editor's suggestion to reduce the length of the document: the main body has been reduced from 31 to 24 pages. To do so, Section 5 (“Interpretation with Derivatives in the Sense of Distributions”) has been completely discarded, as well as all remarks. Some material has been moved to the appendices to get closer to the suggested target length: - Appendix B “Counter-example with regularized OT” extends the analysis of Section 2.3; - The proof of Theorem 6 (about the Gradient of the Sinkhorn divergence, formerly in section 4.3) is now in Appendix E. We believe that these changes, along with minor modifications of the text and figures, have made the paper clearer and more direct. In particular, most of the figures have been completely redrawn for improved legibility. Additionally, a new Appendix A “A Neural Network Architecture validating Hypothesis (GΘ)” validates the regularity hypothesis for a large class of generative neural networks. This, among other modifications, addresses the question of tying the theoretical findings with pratical observations from experiments.
Supplementary Material: zip
Assigned Action Editor: ~marco_cuturi2
Submission Number: 320