A Theory for Conditional Generative Modeling on Multiple Data Sources

Published: 06 Mar 2025, Last Modified: 13 Apr 2025ICLR 2025 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: tiny / short paper (up to 4 pages)
Keywords: multiple data sources, distribution estimation, MLE, generative model
TL;DR: We analyze distribution estimation of conditional generative modeling on multiple data sources via MLE from the perspective of statistical learning theory.
Abstract: The success of large generative models has driven a paradigm shift, leveraging massive multi-source data to enhance model capabilities. However, the interaction among these sources remains theoretically underexplored. This paper takes a first step toward a rigorous analysis of multi-source training in conditional generative modeling, where each condition represents a distinct data source. Specifically, we establish a general distribution estimation error bound in average total variation distance for conditional maximum likelihood estimation (MLE) based on the bracketing number. Our result shows that when source distributions share similarity and the model is sufficiently expressive, multi-source training guarantees a sharper bound than single-source training. We further instantiate the general theory on conditional Gaussian estimation as an illustrative example. The result highlights that the number of sources and similarity among source distributions improve the advantage of multi-source training. Simulations and real-world experiments validate our findings. We hope this work inspires further theoretical understandings of multi-source training in generative modeling.
Submission Number: 60
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview