Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial TimeDownload PDF

28 Sep 2020 (modified: 30 Jan 2021)ICLR 2021 SpotlightReaders: Everyone
  • Keywords: Convex optimization, non-convex optimization, group sparsity, $\ell_1$ norm, convex duality, polynomial time, deep learning
  • Abstract: We study training of Convolutional Neural Networks (CNNs) with ReLU activations and introduce exact convex optimization formulations with a polynomial complexity with respect to the number of data samples, the number of neurons, and data dimension. More specifically, we develop a convex analytic framework utilizing semi-infinite duality to obtain equivalent convex optimization problems for several two- and three-layer CNN architectures. We first prove that two-layer CNNs can be globally optimized via an $\ell_2$ norm regularized convex program. We then show that three-layer CNN training problems are equivalent to an $\ell_1$ regularized convex program that encourages sparsity in the spectral domain. We also extend these results to multi-layer CNN architectures including three-layer networks with two ReLU layers and deeper circular convolutions with a single ReLU layer. Furthermore, we present extensions of our approach to different pooling methods, which elucidates the implicit architectural bias as convex regularizers.
  • One-sentence Summary: We study the training problem for various CNN architectures with ReLU activations and introduce equivalent finite dimensional convex formulations that can be used to globally optimize these architectures.
  • Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
  • Supplementary Material: zip
13 Replies