Training with Mixed-Precision Floating-Point Assignments
Abstract: When training deep neural networks, keeping all tensors in high precision (e.g., 32-bit or even 16-bit floats) is often wasteful. However, keeping all tensors in low precision (e.g., 8-bit floats) can lead to unacceptable accuracy loss. Hence, it is important to use a precision assignment—a mapping from all tensors (arising in training) to precision levels (high or low)—that keeps most of the tensors in low precision and leads to sufficiently accurate models. We provide a technique that explores this memory-accuracy tradeoff by generating precision assignments for convolutional neural networks that (i) use less memory and (ii) lead to more accurate convolutional networks at the same time, compared to the precision assignments considered by prior work in low-precision floating-point training. We evaluate our technique on image classiﬁcation tasks by training convolutional networks on CIFAR-10, CIFAR-100, and ImageNet. Our method typically provides > 2× memory reduction over a baseline precision assignment while preserving training accuracy, and gives further reductions by trading off accuracy. Compared to other baselines which sometimes cause training to diverge, our method provides similar or better memory reduction while avoiding divergence.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Major changes made on June 21 are below. * We have added author information and the acknowledgments section. Major changes made on June 8 are below. All changes are marked by red. * We have added Footnote 3 (page 8), to clarify the phrase "< 3% of the maximum model aggregate". Major changes made on May 25 are below. All changes are marked by red. * We have edited the abstract (page 1), to further clarify our contributions. Major changes made on May 1 are below. All changes are marked by red. * We have added text to the abstract and introduction (pages 1-3), to highlight that our experiments consider image classification tasks and convolutional networks. * We have added text and Algorithms 1-2 to Section 4 (pages 6-7) and edited text in 5.1 (page 9), to make our algorithms more clear. * We have added text to Sections 5.2-5.3 (pages 9-10), to clarify that we measured and reported the average of low-precision ratio over all epochs of each training. * We have added Figures 13-15 to Appendix C.1 (pages 27-28) and related text to Section 5.3 and Appendix C.1 (pages 10, 12, and 22), to compare our assignments against existing assignments equipped with our precision promotion technique. * We have added Figure 9 to Appendix C.1 (page 23), to provide a zoomed-in version of Figure 3(left).
Assigned Action Editor: ~Nadav_Cohen1
Submission Number: 933