Farkas layers: don't shift the data, fix the geometry

Aram-Alexandre Pooladian; Chris Finlay; Adam M Oberman

Farkas layers: don't shift the data, fix the geometry

Aram-Alexandre Pooladian, Chris Finlay, Adam M Oberman

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: initialization, deep networks, residual networks, batch normalization, training, optimization

TL;DR: Geometric approach to mimicking effect of batch norm; can still train DNNs at large learning rate in the absence of all normalization

Abstract: Successfully training deep neural networks often requires either {batch normalization}, appropriate {weight initialization}, both of which come with their own challenges. We propose an alternative, geometrically motivated method for training. Using elementary results from linear programming, we introduce Farkas layers: a method that ensures at least one neuron is active at a given layer. Focusing on residual networks with ReLU activation, we empirically demonstrate a significant improvement in training capacity in the absence of batch normalization or methods of initialization across a broad range of network sizes on benchmark datasets.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/farkas-layers-don-t-shift-the-data-fix-the/code)

Original Pdf: pdf

7 Replies

Loading