Keywords: monotone variational inequality, graph neural networks, neural network training
TL;DR: We investigate training neural networks with monotone variation inequality, yielding performance guarantees and competitive/better performance than widely-used stochastic gradient descent methods, especially during initial training phases.
Abstract: The current paper investigates an alternative approach to neural network training, which is a non-convex optimization problem, through the lens of another convex problem — to solve a monotone variational inequality (MVI) - inspired by a recent work of (Juditsky and Nemirovski, 2019). MVI solutions can be found by computationally efficient procedures, with performance guarantee of $\ell_2$ and $\ell_{\infty}$ bounds on model recovery and prediction accuracy under the theoretical setting of training a single-layer linear neural network. We study the use of MVI for training multi-layer neural networks by proposing a practical and completely general algorithm called \textit{stochastic variational inequality} (\texttt{SVI}). We demonstrate its applicability in training fully-connected neural networks, graph neural networks (GNN), and convolutional networks (CNN) (\texttt{SVI} is completely general for training other network architectures). We show the competitive or better performance of \texttt{SVI} compared to widely-used stochastic gradient descent methods on both synthetic and real network data prediction tasks regarding various performance metrics, especially in the improved efficiency in the early stage of training.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Optimization (eg, convex and non-convex optimization)
Supplementary Material: zip
25 Replies
Loading