Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
The Implicit Bias of Gradient Descent on Separable Data
Nov 03, 2017 (modified: Nov 03, 2017)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract: We show that gradient descent on an unregularized logistic
regression problem with separable data converges to the max-margin
solution. The result generalizes also to other monotone decreasing
loss functions with an infimum at infinity, and we also discuss a
multi-class generalizations to the cross entropy loss. Furthermore,
we show this convergence is very slow, and only logarithmic in the
convergence of the loss itself. This can help explain the benefit
of continuing to optimize the logistic or cross-entropy loss even
after the training error is zero and the training loss is extremely
small, and, as we show, even if the validation loss increases. Our
methodology can also aid in understanding implicit regularization in
more complex models and with other optimization methods.
TL;DR:The normalized solution of gradient descent on logistic regression (or a similarly decaying loss) slowly converges to the L2 max margin solution on separable data.