On Joint Regularization and Calibration in Deep Ensembles

TMLR Paper4652 Authors

11 Apr 2025 (modified: 27 Apr 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep ensembles are a powerful tool in machine learning, improving both model performance and uncertainty calibration. While ensembles are typically formed by training and tuning models individually, evidence suggests that jointly tuning the ensemble can lead to better performance. This paper investigates the impact of jointly tuning weight decay, temperature scaling, and early stopping on both predictive performance and uncertainty quantification. Additionally, we propose a partially overlapping holdout strategy that relaxes the need for a common holdout set, thereby increasing ensemble diversity. Our results demonstrate that jointly tuning the ensemble matches or improves performance across all conditions, with significant variation in effect size. We highlight the trade-offs between individual and joint optimization in deep ensemble training, with the overlapping holdout strategy offering an attractive practical solution. We believe our findings provide valuable insights and guidance for practitioners looking to optimize deep ensemble models.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Jasper_Snoek1
Submission Number: 4652
Loading