Understanding the Role of Noisy Statistics in the Regularization Effect of Batch Normalization

Published: 07 Nov 2023, Last Modified: 13 Dec 2023M3L 2023 PosterEveryoneRevisionsBibTeX
Keywords: batch normalization, regularization, deep learning, neural networks, ghost batch normalization, additive noise, multiplicative noise
TL;DR: Noisy statistics in batch normalization are a potent source of regularization that helps generalization
Abstract: Normalization layers have been shown to benefit the training stability and generalization of deep neural networks in various ways. For Batch Normalization (BN), the noisy statistics have been observed to have a regularization effect that depends on the batch size. Following this observation, Hoffer et. al. proposed Ghost Batch Normalization (GBN), where BN is explicitly performed independently on smaller sub-batches, resulting in improved generalization in many settings. In this study we analyze and isolate the effect of the noisy statistics by comparing BN and GBN, introducing a noise injection method. We then quantitatively assess the effects of the noise, juxtaposing it with other regularizers like dropout and examining its potential role in the generalization disparities between batch normalization and its alternatives, including layer normalization and normalization-free methods.
Submission Number: 54