Keywords: implicit minimization, optimization bias, margin based loss functions, flat minima
TL;DR: We study how the batch size and the learning affect the implicit minimization of different loss functions.
Abstract: Understanding the implicit bias of optimization algorithms is important in order to improve generalization of neural networks. One approach to try to exploit such understanding would be to then make the bias explicit in the loss function. Conversely, an interesting approach to gain more insights into the implicit bias could be to study how different loss functions are being implicitly minimized when training the network. In this work, we concentrate our study on the inductive bias occurring when minimizing the cross-entropy loss with different batch sizes and learning rates. We investigate how three loss functions are being implicitly minimized during training. These three loss functions are the Hinge loss with different margins, the cross-entropy loss with different temperatures and a newly introduced Gcdf loss with different standard deviations. This Gcdf loss establishes a connection between a sharpness measure for the 0−1 loss and margin based loss functions. We find that a common behavior is emerging for all the loss functions considered.
Original Pdf: pdf
4 Replies
Loading