A Reproducibility Study of Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks
Abstract: Many neural networks, especially over-parameterized ones, suffer from poor calibration and overconfidence. To address this, Jordahn & Olmos (2024) recently proposed a Two-Stage Training (TST) procedure that decouples the training of feature extraction and classification layers. In this study, we replicate their findings and extend their work through a series of ablation studies. We reproduce their main results and find that most of them replicate, with slight deviation for CIFAR100. Additionally, we extend the author's results by exploring the impact of different model architectures, Monte Carlo (MC) sample sizes, and classification head designs. We further compare the method with focal loss -- an implicit regularization technique known to improve calibration -- and investigate whether calibration can be improved further by combining the two methods. Beyond focal loss, we also evaluate the effect of incorporating other similar regularization techniques such as label smoothing and L2 regularization during two-stage training. We find that calibration can be improved even further by using focal loss in the first training stage of two-stage training. Similar improvements are observed when combining two-stage training with label smoothing and L2 regularization. Our experiments validate the claims made by Jordahn & Olmos (2024), and show the transferability of the two-stage training to different architectures.
Submission Length: Long submission (more than 12 pages of main content)
Code: https://github.com/JohannaDK/Repro-of-Decoupled-Layers-for-Calibrated-NNs
Assigned Action Editor: ~Lei_Feng1
Submission Number: 4293
Loading