Efficient Uncertainty Estimation via Sensitivity-Guided Subnetwork Selection for Scalable Variational Inference
Abstract: Quantifying predictive uncertainty with minimal computational overhead remains a significant challenge for reliable deep learning applications in safety-critical systems. While Bayesian neural networks (BNNs) are the gold standard for uncertainty quantification, they require considerable training time and computational resources. Although a body of work has focused on mitigating the computational cost of BNN inference via post-hoc approaches, efforts to accelerate training and convergence remain limited. This paper proposes a partial Bayesian training approach via mean-field variational inference (VI), enabling controllable uncertainty modeling through sparse gradient representations. The selection of the variational Bayesian subnetwork is guided by a first-order gradient sensitivity analysis, which is grounded in uncertainty propagation theory. Under mean-field assumptions, we demonstrate how this framework effectively informs the selection of parameters that represent the network's predictive uncertainty. This criterion is also efficiently integrated into auto-differentiation tools avoiding additional computational burdens. The resulting model consists of a combination of deterministic and Bayesian parameters, facilitating an effective, yet efficient, representation of uncertainty. We investigate the effects of varying the proportion of Bayesian parameters (ranging from 1\% to 95\%) across diverse tasks, including regression, classification, and semantic segmentation. Experimental results in MNIST, CIFAR-10, ImageNet, and Cityscapes demonstrate that our approach achieves competitive performance and uncertainty estimates compared to ensemble methods. While maintaining substantially fewer parameters, approximately 50\%, 80\% less than full VI and ensembles, our approach offers reduced training costs with faster convergence compared to full or partial VI trained from scratch. Furthermore, we assess the robustness of predictive uncertainty in the presence of covariate shifts and out-of-distribution data, demonstrating that our method effectively captures uncertainty and exhibits robustness to image corruptions.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Fred_Roosta1
Submission Number: 4889
Loading