Farkas layers: don't shift the data, fix the geometry

Sep 25, 2019 Blind Submission readers: everyone Show Bibtex
  • Keywords: initialization, deep networks, residual networks, batch normalization, training, optimization
  • TL;DR: Geometric approach to mimicking effect of batch norm; can still train DNNs at large learning rate in the absence of all normalization
  • Abstract: Successfully training deep neural networks often requires either {batch normalization}, appropriate {weight initialization}, both of which come with their own challenges. We propose an alternative, geometrically motivated method for training. Using elementary results from linear programming, we introduce Farkas layers: a method that ensures at least one neuron is active at a given layer. Focusing on residual networks with ReLU activation, we empirically demonstrate a significant improvement in training capacity in the absence of batch normalization or methods of initialization across a broad range of network sizes on benchmark datasets.
  • Original Pdf:  pdf
0 Replies