Mixed Precision TrainingDownload PDF

Feb 15, 2018 (edited Mar 07, 2022)ICLR 2018 Conference Blind SubmissionReaders: Everyone
  • Keywords: Half precision, float16, Convolutional neural networks, Recurrent neural networks
  • Abstract: Increasing the size of a neural network typically improves accuracy but also increases the memory and compute requirements for training the model. We introduce methodology for training deep neural networks using half-precision floating point numbers, without losing model accuracy or having to modify hyper-parameters. This nearly halves memory requirements and, on recent GPUs, speeds up arithmetic. Weights, activations, and gradients are stored in IEEE half-precision format. Since this format has a narrower range than single-precision we propose three techniques for preventing the loss of critical information. Firstly, we recommend maintaining a single-precision copy of weights that accumulates the gradients after each optimizer step (this copy is rounded to half-precision for the forward- and back-propagation). Secondly, we propose loss-scaling to preserve gradient values with small magnitudes. Thirdly, we use half-precision arithmetic that accumulates into single-precision outputs, which are converted to half-precision before storing to memory. We demonstrate that the proposed methodology works across a wide variety of tasks and modern large scale (exceeding 100 million parameters) model architectures, trained on large datasets.
  • Code: [![github](/images/github_icon.svg) baidu-research/DeepBench](https://github.com/baidu-research/DeepBench) + [![Papers with Code](/images/pwc_icon.svg) 6 community implementations](https://paperswithcode.com/paper/?openreview=r1gs9JgRZ)
  • Data: [100DOH](https://paperswithcode.com/dataset/100doh)
10 Replies