TL;DR: This paper theoretically shows the universal approximation capacity of a network with the layers of only linear and normalization.
Abstract: Universal approximation theorem (UAT) is the fundamental theory for deep neural networks (DNNs), showing the powerful representation capacity of DNNs in approximating any function. The analyses and proofs of UAT are based on a traditional network with only linear and nonlinear activation layers, but omitting normalization layers which are commonly used for benefiting the training of modern networks. This paper conducts research on UAT of DNNs with normalization layers for the first time. We theoretically prove an infinitely wide network---with parallel layer normalizations (PLN) and linear layers only---has universal approximation capacity. We further investigate the minimum neurons required for approximate $L$-Lipchitz continuous functions, with a single hidden-layer network. We compare the approximation capacity of PLN with traditional activation functions, both in theory and by experiments. We also show PLN's approximation capacity in CNN and Transformer by experiments.
Primary Area: Deep Learning->Theory
Keywords: Universal Approximation Theorem, Normalization
Submission Number: 4254
Loading