Neural Network-Based Diffusion Models Adapt to Low-Dimensional Multi-Modal Data Structure

Published: 26 May 2026, Last Modified: 26 May 2026ICML 2026 FoGen Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion models, Neural networks, Sample complexity, Minimax optimality
Abstract: Score-based diffusion models have become a leading approach to generative modeling, with strong empirical performance on high-dimensional data. Their success is particularly striking when the data possess hidden low-dimensional and multi-modal structure. However, a sharp statistical explanation for this phenomenon remains incomplete. Existing theoretical guarantees often rely on assumptions such as uniformly bounded densities, globally smooth score functions, or log-concavity, which are often too restrictive to describe high-dimensional data in practice. This work proves that neural-network-based diffusion models can adapt to low-dimensional multi-modal data distribution. Under a union-of-subspaces model with subgaussian components, we show that $\widetilde{O}(\varepsilon^{-k})$ samples suffice to achieve $\varepsilon$ error in 1-Wasserstein distance, where $k$ denotes the intrinsic dimension. The resulting rate depends on $k$, rather than the ambient dimension, showing that neural-network score estimation can exploit the underlying structure of the data. We further extend the theory to near-union-of-subspaces distributions, establishing robustness to deviations from the ideal model. This provides statistical support for the adaptivity of practical diffusion models on structured high-dimensional data.
Submission Number: 125
Loading