Keywords: Distributed optimization, saddle avoidance, gradient clipping
Abstract: A critical challenge in distributed nonconvex optimization is efficiently avoiding saddle points, which is vital for ensuring accurate and fast convergence to a desired equilibrium point. In this work, we demonstrate that gradient clipping, a technique widely used in machine learning to mitigate gradient explosion, can significantly accelerate the escape from saddle points and improve the speed of convergence to second-order stationary points in distributed optimization. More specifically, we propose an algorithm that exploits gradient clipping to achieve faster convergence in distributed nonconvex optimization. The result is significant in that gradient clipping is necessary and widely used anyway to avoid exploding gradients in deep learning, and hence, the extra benefit of faster saddle avoidance is achieved for free. In fact, we prove that our algorithm converges to a desired second-order stationary point faster than all existing saddle avoidance approaches for distributed optimization. Numerical experiments on benchmark datasets validate the effectiveness of the proposed method.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 12562
Loading