Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training
Tim Tsz-Kit Lau, Jinshan Zeng, Baoyuan Wu, Yuan Yao
Feb 12, 2018 (modified: Mar 21, 2018)ICLR 2018 Workshop Submissionreaders: everyone
Abstract:Training deep neural networks (DNNs) efficiently is a challenge due to the associated highly nonconvex optimization. The backpropagation (backprop) algorithm has long been the most widely used algorithm for gradient computation of parameters of DNNs and is used along with gradient descent-type algorithms for this optimization task. Recent work have shown the efficiency of block coordinate descent (BCD) type methods empirically for training DNNs. In view of this, we propose a novel algorithm based on the BCD method for training DNNs and provide its global convergence results built upon the powerful framework of the Kurdyka-Lojasiewicz (KL) property. Numerical experiments on standard datasets demonstrate its competitive efficiency against standard optimizers with backprop.
TL;DR:An efficient block coordinate descent algorithm is proposed for training deep neural networks with convergence guarantees built upon the powerful framework of the Kurdyka-Lojasiewicz (KL) property according to a block multiconvex formulation of the training objective, whose competitive efficiency is demonstrated using the MNIST and CIFAR-10 datasets.
Keywords:Block coordinate descent, nonconvex optimization, Kurdyka-Lojasiewicz property, deep neural network training
Enter your feedback below and we'll get back to you as soon as possible.