Keywords: wide neural networks, directed acyclic graph, transition to linearity, neural tangent kernel, over-parameterization
TL;DR: Feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their “width” approaches infinity.
Abstract: In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their ``width'' approaches infinity. The width of these general networks is characterized by the minimum in-degree of their neurons, except for the input and first layers. Our results identify the mathematical structure underlying transition to linearity and generalize a number of recent works aimed at characterizing transition to linearity or constancy of the Neural Tangent Kernel for standard architectures.
Supplementary Material: pdf