Farkas layers: don't shift the data, fix the geometryDownload PDF

25 Sep 2019 (modified: 24 Dec 2019)ICLR 2020 Conference Blind SubmissionReaders: Everyone
  • Original Pdf: pdf
  • Keywords: initialization, deep networks, residual networks, batch normalization, training, optimization
  • TL;DR: Geometric approach to mimicking effect of batch norm; can still train DNNs at large learning rate in the absence of all normalization
  • Abstract: Successfully training deep neural networks often requires either {batch normalization}, appropriate {weight initialization}, both of which come with their own challenges. We propose an alternative, geometrically motivated method for training. Using elementary results from linear programming, we introduce Farkas layers: a method that ensures at least one neuron is active at a given layer. Focusing on residual networks with ReLU activation, we empirically demonstrate a significant improvement in training capacity in the absence of batch normalization or methods of initialization across a broad range of network sizes on benchmark datasets.
7 Replies