Gradient-free Neural Network Training by Multi-convex Alternating Optimization

Junxiang Wang; Fuxun Yu; Xiang Chen; Liang Zhao

Gradient-free Neural Network Training by Multi-convex Alternating Optimization

Junxiang Wang, Fuxun Yu, Xiang Chen, Liang Zhao

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: neural network, alternating minimization, global convergence

TL;DR: We propose a novel Deep Learning Alternating Minimization (DLAM) algorithm to solve the fully- connected neural network problem with convergence guarantee

Abstract: In recent years, stochastic gradient descent (SGD) and its variants have been the dominant optimization methods for training deep neural networks. However, SGD suffers from limitations such as the lack of theoretical guarantees, vanishing gradients, excessive sensitivity to input, and difficulties solving highly non-smooth constraints and functions. To overcome these drawbacks, alternating minimization-based methods for deep neural network optimization have attracted fast-increasing attention recently. As an emerging and open domain, however, several new challenges need to be addressed, including 1) Convergence depending on the choice of hyperparameters, and 2) Lack of unified theoretical frameworks with general conditions. We, therefore, propose a novel Deep Learning Alternating Minimization (DLAM) algorithm to deal with these two challenges. Our innovative inequality-constrained formulation infinitely approximates the original problem with non-convex equality constraints, enabling our proof of global convergence of the DLAM algorithm under mild, practical conditions, regardless of the choice of hyperparameters and wide range of various activation functions. Experiments on benchmark datasets demonstrate the effectiveness of DLAM.

Original Pdf: pdf

5 Replies

Loading