Keywords: Stochastic optimization, clipping methods, non-convex optimization
TL;DR: We show almost sure convergence for a class of stochastic Hamiltonian descent methods. The analysis is applied to gradient clipping and normalization of SGD with momentum.
Abstract: Gradient normalization and soft clipping are two popular techniques for tackling instability issues and improving convergence of stochastic gradient descent (SGD) with momentum.
In this article, we study these types of methods through the lens of dissipative Hamiltonian systems. Gradient normalization and certain types of soft clipping algorithms can be seen as (stochastic) implicit-explicit Euler discretizations of dissipative Hamiltonian systems, where the kinetic energy function determines the type of clipping that is applied.
We make use of dynamical systems theory to show in a unified way that all of these schemes converge to stationary points of the objective function, almost surely, in several different settings:
a) for $L-$smooth objective functions,
when the variance of the stochastic gradients is possibly infinite
b) under the $(L_0,L_1)-$smoothness assumption, for heavy-tailed noise with bounded variance and c) for $(L_0,L_1)-$smooth functions in the empirical risk minimization setting, when the variance is possibly infinite but the expectation is finite.
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10110
Loading