An alternative approach to train neural networks using monotone variational inequality

Published: 26 Oct 2023, Last Modified: 13 Dec 2023NeurIPS 2023 Workshop PosterEveryoneRevisionsBibTeX
Keywords: first-order methods, monotone variational inequality
TL;DR: We develop a first-order neural network training method based on monotone variational inequality and demonstrate faster initial convergence against SGD.
Abstract: We investigate an alternative approach to neural network training, which is a non-convex optimization problem, through the lens of another convex problem — to solve a monotone variational inequality (MVI) - inspired by the work of [Juditsky and Nemirovsky, 2019]. MVI solutions can be found by computationally efficient procedures, with performance guarantee of $\ell_2$ and $\ell_{\infty}$ bounds on model recovery and prediction accuracy under the theoretical setting of training a single-layer linear neural network. We study the use of MVI for training multi-layer neural networks by proposing a practical and completely general algorithm called \textit{stochastic variational inequality} (\texttt{SVI}). We demonstrate its applicability in training networks with various architectures (\texttt{SVI} is completely general for training any network). We show the competitive or better performance of \texttt{SVI} compared to the widely-used stochastic gradient descent method (SGD) on both synthetic and real data prediction tasks regarding various performance metrics, especially in the improved efficiency in the early stage of training.
Submission Number: 56