Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time AlgorithmsDownload PDF

28 Sept 2020, 15:52 (modified: 10 Feb 2022, 11:49)ICLR 2021 PosterReaders: Everyone
Keywords: neural networks, theory, convex optimization, copositive programming, convex duality, nonnegative PCA, semi-nonnegative matrix factorization, computational complexity, global optima, semi-infinite duality, convolutional neural networks
Abstract: We describe the convex semi-infinite dual of the two-layer vector-output ReLU neural network training problem. This semi-infinite dual admits a finite dimensional representation, but its support is over a convex set which is difficult to characterize. In particular, we demonstrate that the non-convex neural network training problem is equivalent to a finite-dimensional convex copositive program. Our work is the first to identify this strong connection between the global optima of neural networks and those of copositive programs. We thus demonstrate how neural networks implicitly attempt to solve copositive programs via semi-nonnegative matrix factorization, and draw key insights from this formulation. We describe the first algorithms for provably finding the global minimum of the vector output neural network training problem, which are polynomial in the number of samples for a fixed data rank, yet exponential in the dimension. However, in the case of convolutional architectures, the computational complexity is exponential in only the filter size and polynomial in all other parameters. We describe the circumstances in which we can find the global optimum of this neural network training problem exactly with soft-thresholded SVD, and provide a copositive relaxation which is guaranteed to be exact for certain classes of problems, and which corresponds with the solution of Stochastic Gradient Descent in practice.
One-sentence Summary: We demonstrate that two-layer vector-output ReLU networks can be expressed as copositive programs, and introduce algorithms for provably finding their global optima, which are polynomial in the number of samples for a fixed data rank.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Data: [CIFAR-10](, [CIFAR-100](
16 Replies