Extreme normalization: approximating full-data batch normalization with single examplesDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: batch normalization, optimization
Abstract: While batch normalization has been successful in speeding up the training of neural networks, it is not well understood. We cast batch normalization as an approximation of the limiting case where the entire dataset is normalized jointly, and explore other ways to approximate the gradient from this limiting case. We demonstrate an approximation that removes the need to keep more than one example in memory at any given time, at the cost of a small factor increase in the training step computation, as well as a fully per-example training procedure, which removes the extra computation at the cost of a small drop in the final model accuracy. We further use our insights to improve batch renormalization for very small minibatches. Unlike previously proposed methods, our normalization does not change the function class of the inference model, and performs well in the absence of identity shortcuts.
One-sentence Summary: We cast batch normalization as an approximation of full-batch normalization, and propose an alternative with minimal (or no) cross-example interactions
5 Replies

Loading