Scaling Laws for Uncertainty in Deep Learning

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Scaling Laws, Bayesian Deep Learning, Uncertainty Quantification
Abstract: Scaling laws in deep learning describe the predictable relationship between a model's performance, usually measured by test loss, and some key design choices, such as dataset and model size. Inspired by these findings and fascinating phenomena emerging in the over-parameterized regime, we investigate a parallel direction: do similar scaling laws govern predictive uncertainties in deep learning? In identifiable parametric models, such scaling laws can be derived in a straightforward manner by treating model parameters in a Bayesian way. In this case, for example, we obtain $O(1/N)$ contraction rates for epistemic uncertainty with respect to dataset size $N$. However, in over-parameterized models, these guarantees do not hold, leading to largely unexplored behaviors. In this work, we empirically show the existence of scaling laws associated with various measures of predictive uncertainty with respect to dataset and model size. Through experiments on vision and language tasks, we observe such scaling laws for in- and out-of-distribution predictive uncertainty estimated through popular approximate Bayesian inference and ensemble methods. Besides the elegance of scaling laws and the practical utility of extrapolating uncertainties to larger data or models, this work provides strong evidence to dispel recurring skepticism against Bayesian approaches: *"In many applications of deep learning we have so much data available: what do we need Bayes for?"*. Our findings show that *"so much data"* is typically not enough to make epistemic uncertainty negligible.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 25158
Loading