Universal Approximation Theorem of Networks Activated by Normalization

Yunhao Ni; Yuhe Liu; WenXin Sun; Yitong Tang; Peilin Feng; Yuxin Guo; wenjun wu; Lei Huang

Universal Approximation Theorem of Networks Activated by Normalization

Yunhao Ni, Yuhe Liu, WenXin Sun, Yitong Tang, Peilin Feng, Yuxin Guo, wenjun wu, Lei Huang

21 Jan 2025 (modified: 18 Jun 2025)Submitted to ICML 2025EveryoneRevisionsBibTeXCC BY 4.0

TL;DR: This paper theoretically shows the universal approximation capacity of a network with the layers of only linear and normalization.

Abstract: Universal approximation theorem (UAT) is the fundamental theory for deep neural networks (DNNs), showing the powerful representation capacity of DNNs in approximating any function. The analyses and proofs of UAT are based on a traditional network with only linear and nonlinear activation layers, but omitting normalization layers which are commonly used for benefiting the training of modern networks. This paper conducts research on UAT of DNNs with normalization layers for the first time. We theoretically prove an infinitely wide network---with parallel layer normalizations (PLN) and linear layers only---has universal approximation capacity. We further investigate the minimum neurons required for approximate $L$-Lipchitz continuous functions, with a single hidden-layer network. We compare the approximation capacity of PLN with traditional activation functions, both in theory and by experiments. We also show PLN's approximation capacity in CNN and Transformer by experiments.

Primary Area: Deep Learning->Theory

Keywords: Universal Approximation Theorem, Normalization

Submission Number: 4254

Loading