Minimax generalized cross-entropy
Abstract: Loss functions play a central role in supervised classification. Cross-entropy (CE) is widely used, whereas the 0–1 loss can offer robustness but is difficult to optimize. Interpolating between the CE and 0-1 losses, generalized cross-entropy (GCE) has been recently introduced to provide a trade-off between optimization difficulty and robustness. Existing formulations of GCE result in a non-convex optimization over classification margins that are prone to underfitting, leading to poor performances with complex datasets. In this paper, we propose a minimax formulation of generalized cross-entropy (MGCE) that results in a convex optimization over classification margins. Moreover, we show that MGCEs can provide an upper bound on the 0–1 classification risk. The proposed bilevel convex optimization can be efficiently implemented using stochastic gradient computed via implicit differentiation. Using benchmark datasets, we show that MGCE achieves strong accuracy with faster convergence, and better calibration, especially under label noise.
Submission Number: 1491
Loading