A Note On The Stability Of The Focal Loss

TMLR Paper5166 Authors

20 Jun 2025 (modified: 29 Aug 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The Focal loss is a widely deployed loss function that is used to train various types of deep learning models. This loss function is a modification of the cross-entropy loss designed to mitigate the effect of class imbalance in dense object detection tasks by downweighing easy, well-classified examples. In doing so, more focus is placed on hard, wrongly-classified examples by preventing the gradients from being dominated by examples from which the model can easily predict the correct class. This downweighing is achieved by scaling the cross-entropy loss with a term that depends on a focusing parameter $\gamma$. In this paper, we highlight an unaddressed numerical instability of the Focal loss that arises when this focusing parameter is set to a value between 0 and 1. We present the theoretical foundation behind this instability, show that it is numerically identifiable, and demonstrate it in a binary classification and segmentation task on the MNIST dataset. Additionally, we propose a straightforward modification to the original Focal loss to ensure stability whenever these unstable focusing parameter values are used.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: The changes that have been made since the last submission are described below: 1. We included an additional experiment in which we performed a binary classification task on the CIFAR-10 dataset using a CNN and a Vision Transformer. We included the architecture of the vision transformer in the Appendix together with the code to run the classification experiment. 2. We have included a paragraph in the introduction section on page 2 where related work on the focal loss based loss functions. We believe that these related works also suffer from the same numerical instability. 3. We have added a marker in Figure 1 that indicates the point in which the numerical instability arises. 4. We have removed the limit equations 8,9,16, and17 on pages 3 and 5. The limit equations were replaced by equations that indicate the derivative of the foreground and background loss when the model output becomes equal to the ground truth label. 5. We modified Figure 6 on page 10 to make it clear that plot $(a)$ shows the training results when training with the unstable Focal loss, and plot $(b)$ shows the training results when training with the stabilized Focal loss. 6. We added function arguments for all equations.
Assigned Action Editor: ~Venkatesh_Babu_Radhakrishnan2
Submission Number: 5166
Loading