Variable Activation Networks: A simple method to train deep feed-forward networks without skip-connections


Nov 03, 2017 (modified: Nov 03, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Novel architectures such as ResNets have enabled the training of very deep feed-forward networks via the introduction of skip-connections, leading to state-of-the-art results in many applications. Part of the success of ResNets has been attributed to improvements in the conditioning of the optimization problem (e.g., avoiding vanishing and shattered gradients). In this work we propose a simple method to extend these benefits to the context of deep networks without skip-connections. The proposed method poses the learning of weights in deep networks as a constrained optimization problem where the presence of skip-connections is penalized by Lagrange multipliers. This allows for skip-connections to be introduced during the early stages of training and subsequently phased out in a principled manner. We demonstrate the benefits of such an approach with experiments on MNIST, fashion-MNIST, CIFAR-10 and CIFAR-100 where the proposed method is shown to outperform many architectures without skip-connections and is often competitive with ResNets.
  • TL;DR: We introduce skip-connections penalized by Lagrange multipliers to train deep feed-forward networks. Skip-connections are thereby introduced during the early stages of training and iteratively phased out in a principled manner.
  • Keywords: optimization, vanishing gradients, shattered gradients, skip-connections