Particle Stochastic Dual Coordinate Ascent: Exponential convergent algorithm for mean field neural network optimization

Kazusato Oko; Taiji Suzuki; Atsushi Nitanda; Denny Wu

Particle Stochastic Dual Coordinate Ascent: Exponential convergent algorithm for mean field neural network optimization

Kazusato Oko, Taiji Suzuki, Atsushi Nitanda, Denny Wu

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 PosterReaders: Everyone

Keywords: Neural Network Optimization, Mean field Regime, Overparameterization

Abstract: We introduce Particle-SDCA, a gradient-based optimization algorithm for two-layer neural networks in the mean field regime that achieves exponential convergence rate in regularized empirical risk minimization. The proposed algorithm can be regarded as an infinite dimensional extension of Stochastic Dual Coordinate Ascent (SDCA) in the probability space: we exploit the convexity of the dual problem, for which the coordinate-wise proximal gradient method can be applied. Our proposed method inherits advantages of the original SDCA, including (i) exponential convergence (with respect to the outer iteration steps), and (ii) better dependency on the sample size and condition number than the full-batch gradient method. One technical challenge in implementing the SDCA update is the intractable integral over the entire parameter space at every step. To overcome this limitation, we propose a tractable \textit{particle method} that approximately solves the dual problem, and an importance re-weighted technique to reduce the computational cost. The convergence rate of our method is verified by numerical experiments.

One-sentence Summary: Proposed a new algorithm for optimizing two-layer neural network in the mean field regime that achieves exponential convergence in regularized empirical risk minimization (w.r.t. outer loop iterations).

Supplementary Material: zip

12 Replies

Loading