Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models

Chaoyue Liu; Libin Zhu; Misha Belkin

Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models

Chaoyue Liu, Libin Zhu, Misha Belkin

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SpotlightReaders: Everyone

Keywords: Assembling, linearity, Transition to linearity, wide neural networks

Abstract: Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK), in a region containing the optimization path of gradient descent. These findings seem counter-intuitive since in general neural networks are highly complex models. Why does a linear structure emerge when the neural networks become wide? In this work, we provide a new perspective on this "transition to linearity" by considering a neural network as an assembly model recursively built from a set of sub-models corresponding to individual neurons. In this view, we show that the linearity of wide neural networks is, in fact, an emerging property of assembling a large number of diverse ``weak'' sub-models, none of which dominate the assembly.

One-sentence Summary: Transition to linearity of wide neural networks is an emerging property of assembling weak models corresponding to individual neurons

11 Replies

Loading