Decoupling Gating from Linearity

Yonathan Fiat; Eran Malach; Shai Shalev-Shwartz

Decoupling Gating from Linearity

Yonathan Fiat, Eran Malach, Shai Shalev-Shwartz

27 Sept 2018 (modified: 21 Apr 2024)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: The gap between the empirical success of deep learning and the lack of strong theoretical guarantees calls for studying simpler models. By observing that a ReLU neuron is a product of a linear function with a gate (the latter determines whether the neuron is active or not), where both share a jointly trained weight vector, we propose to decouple the two. We introduce GaLU networks — networks in which each neuron is a product of a Linear Unit, defined by a weight vector which is being trained, with a Gate, defined by a different weight vector which is not being trained. Generally speaking, given a base model and a simpler version of it, the two parameters that determine the quality of the simpler version are whether its practical performance is close enough to the base model and whether it is easier to analyze it theoretically. We show that GaLU networks perform similarly to ReLU networks on standard datasets and we initiate a study of their theoretical properties, demonstrating that they are indeed easier to analyze. We believe that further research of GaLU networks may be fruitful for the development of a theory of deep learning.

Keywords: Artificial Neural Networks, Neural Networks, ReLU, GaLU, Deep Learning

TL;DR: We propose Gated Linear Unit networks — a model that performs similarly to ReLU networks on real data while being much easier to analyze theoretically.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:1906.05032/code)

4 Replies

Loading