Characterizing signal propagation to close the performance gap in unnormalized ResNets

Andrew Brock; Soham De; Samuel L Smith

Characterizing signal propagation to close the performance gap in unnormalized ResNets

Andrew Brock, Soham De, Samuel L Smith

Published: 12 Jan 2021, Last Modified: 22 Jun 2025ICLR 2021 PosterReaders: Everyone

Keywords: normalizers, signal propagation, deep learning, neural networks, ResNets, EfficientNets, ImageNet, CNNs, ConvNets

Abstract: Batch Normalization is a key component in almost all state-of-the-art image classifiers, but it also introduces practical challenges: it breaks the independence between training examples within a batch, can incur compute and memory overhead, and often results in unexpected bugs. Building on recent theoretical analyses of deep ResNets at initialization, we propose a simple set of analysis tools to characterize signal propagation on the forward pass, and leverage these tools to design highly performant ResNets without activation normalization layers. Crucial to our success is an adapted version of the recently proposed Weight Standardization. Our analysis tools show how this technique preserves the signal in ReLU networks by ensuring that the per-channel activation means do not grow with depth. Across a range of FLOP budgets, our networks attain performance competitive with state-of-the-art EfficientNets on ImageNet.

One-sentence Summary: We show how to train ResNets completely without normalization, and attain performance competitive with batch-normalized EfficientNets.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Code: [![github](/images/github_icon.svg) deepmind/deepmind-research](https://github.com/deepmind/deepmind-research/tree/master/nfnets) + [![Papers with Code](/images/pwc_icon.svg) 3 community implementations](https://paperswithcode.com/paper/?openreview=IX3Nnir2omJ)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/characterizing-signal-propagation-to-close/code)

13 Replies

Loading