Mixed Precision Training

Paulius Micikevicius; Sharan Narang; Jonah Alben; Gregory Diamos; Erich Elsen; David Garcia; Boris Ginsburg; Michael Houston; Oleksii Kuchaiev; Ganesh Venkatesh; Hao Wu

Mixed Precision Training

Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu

15 Feb 2018 (modified: 22 Jun 2025)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: Increasing the size of a neural network typically improves accuracy but also increases the memory and compute requirements for training the model. We introduce methodology for training deep neural networks using half-precision floating point numbers, without losing model accuracy or having to modify hyper-parameters. This nearly halves memory requirements and, on recent GPUs, speeds up arithmetic. Weights, activations, and gradients are stored in IEEE half-precision format. Since this format has a narrower range than single-precision we propose three techniques for preventing the loss of critical information. Firstly, we recommend maintaining a single-precision copy of weights that accumulates the gradients after each optimizer step (this copy is rounded to half-precision for the forward- and back-propagation). Secondly, we propose loss-scaling to preserve gradient values with small magnitudes. Thirdly, we use half-precision arithmetic that accumulates into single-precision outputs, which are converted to half-precision before storing to memory. We demonstrate that the proposed methodology works across a wide variety of tasks and modern large scale (exceeding 100 million parameters) model architectures, trained on large datasets.

Keywords: Half precision, float16, Convolutional neural networks, Recurrent neural networks

Code: [![Papers with Code](/images/pwc_icon.svg) 8 community implementations](https://paperswithcode.com/paper/?openreview=r1gs9JgRZ)

Data: [100DOH](https://paperswithcode.com/dataset/100doh)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/mixed-precision-training/code)

10 Replies

Loading