Gradient Descent Converges Linearly for Logistic Regression on Separable DataDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: logistic regression, gradient descent, sparse optimization
TL;DR: We theoretically show that gradient descent with increasing learning rate obtains favorable rates on logistic regression.
Abstract: We show that running gradient descent on the logistic regression objective guarantees loss $f(x) \leq 1.1 \cdot f(x^*) + \epsilon$, where the error $\epsilon$ decays exponentially with the number of iterations. This is in contrast to the common intuition that the absence of strong convexity precludes linear convergence of first-order methods, and highlights the importance of variable learning rates for gradient descent. For separable data, our analysis proves that the error between the predictor returned by gradient descent and the hard SVM predictor decays as $\mathrm{poly}(1/t)$, exponentially faster than the previously known bound of $O(\log\log t / \log t)$. Our key observation is a property of the logistic loss that we call multiplicative smoothness and is (surprisingly) little-explored: As the loss decreases, the objective becomes (locally) smoother and therefore the learning rate can increase. Our results also extend to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Theory (eg, control theory, learning theory, algorithmic game theory)
18 Replies