Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

Zhao Song; Lichen Zhang; Ruizhe Zhang

Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

Zhao Song, Lichen Zhang, Ruizhe Zhang

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Deep learning, optimization, over-parametrization

Abstract: In the recent years of development of theoretical machine learning, over-parametrization has been shown to be a powerful tool to resolve many fundamental problems, such as the convergence analysis of deep neural network. While many works have been focusing on designing various algorithms for over-parametrized network with one-hidden layer, multiple-hidden layers framework has received much less attention due to the complexity of the analysis, and even fewer algorithms have been proposed. In this work, we initiate the study of the performance of second-order algorithm on multiple-hidden layers over-parametrized neural network. We propose a novel algorithm to train such network, in time subquadratic in the width of the neural network. Our algorithm combines the Gram-Gauss-Newton method, tensor-based sketching techniques and preconditioning.

One-sentence Summary: Training multi-layer overparametrized network using second-order method in subquadratic time per iteration

9 Replies

Loading