Training Neural Networks with Low-Precision Model MemoryDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: memory efficient deep learning, stochastic gradient descent, quantization
TL;DR: We propose memory efficient optimizers for deep learning which keep model parameters, momentum and gradient accumulators in low numerical precision.
Abstract: The demand for memory to store model-related statistics ("model memory") is a major bottleneck for training large neural networks. A promising solution is low-precision optimizers, which reduce the numerical precision of the model memory. However, existing work only compresses the momentum, resulting in suboptimal memory efficiency. This paper proposes Low-Precision Model Memory (LPMM), an optimization framework with the entire model memory kept in low precision. LPMM compresses not only the momentum but also model parameters and gradient accumulators. We identify arithmetic underflow as the main problem in building low-precision optimizers and propose a stochastic quantization method and a microbatching technique to overcome this problem. We analyze the convergence behavior of LPMM and theoretically show how the proposed techniques could affect underflowing, which in turn affects the convergence. We apply LPMM to the SGD optimizer with momentum (SGDM). On several realistic benchmarks, LPMM-SGDM can train neural networks with negligible loss of accuracy while reducing over 70% of the model memory compared to the full-precision SGDM.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
Supplementary Material: zip
20 Replies

Loading