Why do wide stochastic neural networks have vanishing variance?

Why do wide stochastic neural networks have vanishing variance?

TMLR Paper740 Authors

31 Dec 2022 (modified: 03 May 2023)Rejected by TMLREveryoneRevisionsBibTeX

Abstract: This work studies the prediction variance of stochastic neural networks, a main type of neural network in use. We constructively prove that as the width of an optimized stochastic neural network tends to infinity, its predictive variance on the training set decreases to zero. In particular, we show that a solution with vanishing variance exists when the model has a ``two-layer" structure, where the upper layer can copy independent copies of the latent variable, and the second layer can average over such copies to cancel the noise. The main implication of our result is that the popular belief that a powerful decoder causes the neural network prediction variance to vanish is not the full picture. Two common examples of learning systems that our theory can be relevant to are neural networks with dropout and Bayesian latent variable models in a special limit. Our result thus helps us better understand how stochasticity affects the learning of neural networks.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Stanislaw_Kamil_Jastrzebski1

Submission Number: 740

Loading