Dimension-Independent Rates for Structured Neural Density Estimation

Robert A. Vandermeulen; Wai Ming Tai; Bryon Aragam

Dimension-Independent Rates for Structured Neural Density Estimation

Robert A. Vandermeulen, Wai Ming Tai, Bryon Aragam

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We show how deep neural networks achieve dimension-independent rates for learning structured densities typical of image, audio, video, and text data.

Abstract: We show that deep neural networks can achieve dimension-independent rates of convergence for learning structured densities typical of image, audio, video, and text data. For example, in images, where each pixel becomes independent of the rest of the image when conditioned on pixels at most $t$ steps away, a simple $L^2$-minimizing neural network can attain a rate of $n^{-1/((t+1)^2+4)}$, where $t$ is independent of the ambient dimension $d$, i.e. the total number of pixels. We further provide empirical evidence that, in real-world applications, $t$ is often a small constant, thus effectively circumventing the curse of dimensionality. Moreover, for sequential data (e.g., audio or text) exhibiting a similar local dependence structure, our analysis shows a rate of $n^{-1/(t+5)}$, offering further evidence of dimension independence in practical scenarios.

Lay Summary: Modern machine-learning systems often work with “high-dimensional” data—think of a photo with millions of pixels and three numbers for each pixel (red, green, and blue). Classical statistics predicts that learning from such datasets would require an impossibly large number of examples, yet deep learning succeeds with far fewer in practice. Our study offers an explanation for why: Real-world data has inherent structure that can be used by neural networks to learn more efficiently with less data. The idea is that only certain parts of the data are useful for predicting other parts, and this structure is nicely captured with commonly used neural network models. We show that this structure dramatically reduces the amount of data a neural network needs, bringing it down to the same level as for much smaller problems. This provides a fresh explanation for how neural networks learn effectively from the data that is available in modern applications.

Primary Area: Theory->Learning Theory

Keywords: density estimation, image processing, Markov random field, graphical model, nonparametric statistics, high-dimensional statistics, sample complexity

Submission Number: 12334

Loading