Farkas layers: don't shift the data, fix the geometryDownload PDF

25 Sept 2019 (modified: 22 Oct 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone
Keywords: initialization, deep networks, residual networks, batch normalization, training, optimization
TL;DR: Geometric approach to mimicking effect of batch norm; can still train DNNs at large learning rate in the absence of all normalization
Abstract: Successfully training deep neural networks often requires either {batch normalization}, appropriate {weight initialization}, both of which come with their own challenges. We propose an alternative, geometrically motivated method for training. Using elementary results from linear programming, we introduce Farkas layers: a method that ensures at least one neuron is active at a given layer. Focusing on residual networks with ReLU activation, we empirically demonstrate a significant improvement in training capacity in the absence of batch normalization or methods of initialization across a broad range of network sizes on benchmark datasets.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:1910.02840/code)
Original Pdf: pdf
7 Replies

Loading