Keywords: neural networks, uncertainity calibration, out of distribution detection
Abstract: Much recent work has been devoted to the problem of ensuring that a neural network's confidence scores match the true probability of being correct, i.e. the calibration problem. Of note, it was found that training with Focal loss leads to better calibrated deep networks than cross-entropy loss, while achieving the same level of accuracy \cite{mukhoti2020}. This success stems from Focal loss regularizing the entropy of the network's prediction (controlled by the hyper-parameter $\gamma$), thereby reining in the network's overconfidence. Further improvements in calibration can be achieved if $\gamma$ is selected independently for each training sample. However, the proposed strategy (named FLSD-53) is based on simple heuristics which, when selecting the $\gamma$, does not take into account any knowledge of whether the network is under or over confident about such samples and by how much. As a result, in most cases, this strategy performs only slightly better. In this paper, we propose a calibration-aware sample-dependent Focal loss called AdaFocal that adaptively modifies $\gamma$ from one training step to the next based on the information about the network's current calibration behaviour. At each training step $t$, AdaFocal adjusts the $\gamma_t$ based on (1) $\gamma_{t-1}$ of the previous training step (2) the magnitude of the network's under/over-confidence. We evaluate our proposed method on various image recognition and NLP tasks, covering a variety of network architectures, and confirm that AdaFocal consistently achieves significantly better calibration than the competing state-of-the-art methods without loss of accuracy.
One-sentence Summary: Our paper proposes a modification to focal loss which results in improved calibration of neural networks without any loss of accuracy.
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/arxiv:2211.11838/code)
18 Replies
Loading