On Joint Regularization and Calibration in Deep Ensembles

TMLR Paper4652 Authors

11 Apr 2025 (modified: 08 Aug 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep ensembles are a powerful tool in machine learning, improving both model performance and uncertainty calibration. While ensembles are typically formed by training and tuning models individually, evidence suggests that jointly tuning the ensemble can lead to better performance. This paper investigates the impact of jointly tuning weight decay, temperature scaling, and early stopping on both predictive performance and uncertainty quantification. Additionally, we propose a partially overlapping holdout strategy as a practical compromise between enabling joint evaluation and maximizing the use of data for training. Our results demonstrate that jointly tuning the ensemble generally matches or improves performance, with significant variation in effect size across different tasks and metrics. We highlight the trade-offs between individual and joint optimization in deep ensemble training, with the overlapping holdout strategy offering an attractive practical solution. We believe our findings provide valuable insights and guidance for practitioners looking to optimize deep ensemble models.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Jasper_Snoek1
Submission Number: 4652
Loading