TL;DR: We study the theoretical properties of conditional deep generative models under the framework of distribution regression where the response variable lies in a high-dimensional ambient space but concentrates around a lower-dimensional manifold.
Abstract: In this work, we explore the theoretical properties of conditional deep generative models under the statistical framework of distribution regression where the response variable lies in a high-dimensional ambient space but concentrates around a potentially lower-dimensional manifold. More specifically, we study the large-sample properties of a likelihood-based approach for estimating these models. Our results lead to the convergence rate of a sieve maximum likelihood estimator (MLE) for estimating the conditional distribution (and its devolved counterpart) of the response given predictors in the Hellinger (Wasserstein) metric.
Our rates depend solely on the intrinsic dimension and smoothness of the true conditional distribution. These findings provide an explanation of why conditional deep generative models can circumvent the curse of dimensionality from the perspective of statistical foundations and demonstrate that they can learn a broader class of nearly singular conditional distributions. Our analysis also emphasizes the importance of introducing a small noise perturbation to the data when they are supported sufficiently close to a manifold. Finally, in our numerical studies, we demonstrate the effective implementation of the proposed approach using both synthetic and real-world datasets, which also provide complementary validation to our theoretical findings.
Lay Summary: **Why does this matter?**
Modern data often live in huge, messy spaces (think 4-K images or single-cell genomes) but actually hide on thin “surfaces” inside those spaces. Learning *how the entire distribution* of outcomes changes with predictors in such settings is critical for tasks like probabilistic forecasting or conditional image generation, yet classical statistics choke on the sheer dimensionality.
**What did we do?**
We show—on paper, with proofs—that a familiar deep-learning workhorse, the conditional likelihood based deep neural network (for example: VAE, Normalizing Flows, and kindred generative models), can estimate those high-dimensional conditional distributions *as if* the data were low-dimensional. The trick is a likelihood-based fit called a **sieve maximum-likelihood estimator** that grows in complexity with sample size. We derive crisp formulas for its error: they depend only on the *intrinsic* dimension and smoothness of the true data manifold, not the giant ambient space.
**Why is it cool?**
1. It mathematically explains why deep generative models sidestep the curse of dimensionality.
2. It covers noisy, near-manifold data—realistic for cameras and sensors.
3. It even says how much “dither” noise to add when data are *too* clean.
Simulations (two-moons, ellipses, MNIST) line up with the theory, hinting at reliable uncertainty quantification for next-gen generative AI.
Primary Area: Deep Learning->Theory
Keywords: Distribution Regression; Conditional Deep Generative Models; Intrinsic Manifold Structure; Sieve MLE; Wasserstein Convergence.
Submission Number: 11110
Loading