Keywords: Non-convex Optimization, Non-Euclidean Acceleration, Stochastic Steepest Descent
Abstract: In this work, we analyze stochastic $\ell_p$ steepest descent for non-convex problems. Specifically, for $p > 2$, we establish $\epsilon$-approximate stationarity (in expectation) with respect to the dual norm $\Vert\cdot\Vert_{p^*}^{p^*}$ at a rate of $O(\epsilon^{-4})$, thereby generalizing the previous guarantees for signSGD ($p=\infty$). In addition, inspired by techniques for the convex setting, we present a new accelerated $\ell_p$ descent method, called Stacey, based on interpolated primal-dual iterate sequences that are designed for non-Euclidean smooth optimization settings. We compare our algorithm against popular methods such as SGD, Adam, AdamW, and Lion on image classification and pretraining language modeling tasks, and our results demonstrate the potential for both faster convergence and achieving higher accuracy. We further evaluate our algorithm for different values of $p$ across various models and datasets, highlighting the importance and efficiency of non-Euclidean methods as compared to standard Euclidean-based approaches.
Supplementary Material: zip
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5528
Loading