Efficient Uncertainty Estimation via Sensitivity-Guided Subnetwork Selection for Scalable Variational Inference

TMLR Paper4889 Authors

19 May 2025 (modified: 01 Aug 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Quantifying predictive uncertainty with minimal computational overhead remains a significant challenge for reliable deep learning applications in safety-critical systems. While Bayesian neural networks (BNNs) are the gold standard for uncertainty quantification, they require considerable training time and computational resources. Although a body of work has focused on mitigating the computational cost of BNN inference via post-hoc approaches, efforts to accelerate training and convergence remain limited. This paper proposes a partial Bayesian training approach via mean-field variational inference (VI), enabling controllable uncertainty modeling through sparse gradient representations. The selection of the variational Bayesian subnetwork is guided by a first-order gradient sensitivity analysis, which is grounded in uncertainty propagation theory. Under mean-field assumptions, we demonstrate how this framework effectively informs the selection of parameters that represent the network's predictive uncertainty. This criterion is also efficiently integrated into auto-differentiation tools avoiding additional computational burdens. The resulting model consists of a combination of deterministic and Bayesian parameters, facilitating an effective, yet efficient, representation of uncertainty. We investigate the effects of varying the proportion of Bayesian parameters (ranging from 1\% to 95\%) across diverse tasks, including regression, classification, and semantic segmentation. Experimental results in MNIST, CIFAR-10, ImageNet, and Cityscapes demonstrate that our approach achieves competitive performance and uncertainty estimates compared to ensemble methods. While maintaining substantially fewer parameters, approximately 50\%, 80\% less than full VI and ensembles, our approach offers reduced training costs with faster convergence compared to full or partial VI trained from scratch. Furthermore, we assess the robustness of predictive uncertainty in the presence of covariate shifts and out-of-distribution data, demonstrating that our method effectively captures uncertainty and exhibits robustness to image corruptions.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - Improved writing across the manuscript as per the suggestions of the reviewers. - Changed the relative data in Figures 5 and 6 in with absolute metric values. - Improved the description of the method significantly as per the reviewers' suggestions. Added a more in-depth derivation of the relationship of sensitivity and uncertainty. - Improved notation description, removed redundant information and flow of Section 3. - Clarified the goals of experiments in the introduction of Secion 4. - Corrected typos, and inconsistencies in language. - Added several Appendices for more supporting information on the method (Appendices I and J) - All changes to the manuscript have been highlighted in this revision.
Assigned Action Editor: ~Fred_Roosta1
Submission Number: 4889
Loading