Directly Optimizing Calibrated Test-Time Uncertainty

Published: 01 Mar 2026, Last Modified: 07 Apr 2026TTU at ICLR 2026 (Main)EveryoneRevisionsBibTeXCC BY 4.0
Abstract: Uncertainty in learned predictors is often split into aleatoric and epistemic components using architectural choices or Bayesian approximations, making the decomposition sensitive to modeling details. We propose an objective-driven decomposition into predictive noise ($\psi$) and generalization noise ($\phi$). Predictive noise represents the residual stochasticity needed to fit the training data under a chosen likelihood and model class, while generalization noise captures instability of the learned predictor as revealed by held-out data. Both noises can be instantiated as additive randomness in the predictive distribution (output-only or internal), and they are separable because they are optimized on different splits and losses: standard training NLL for $(\theta,\psi)$ and a held-out marginal log-likelihood for $\phi$. The resulting total predictive distribution improves reliability without explicit ensembles and yields noise-budget learning curves that explain how performance changes across data size and capacity. We demonstrate the decomposition on a controlled mixture model and on MLP regression.
Submission Number: 72
Loading