Abstract: Low-precision training is a promising way of decreasing the time and energy cost of training machine learning models.
Previous work has analyzed low-precision training algorithms, such as low-precision stochastic gradient descent, and derived theoretical bounds on their convergence rates.
These bounds tend to depend on the dimension of the model $d$ in that the number of bits needed to achieve a particular error bound increases as $d$ increases.
This is undesirable because a motivating application for low-precision training is large-scale models, such as deep learning, where $d$ can be huge.
In this paper, we prove dimension-independent bounds for low-precision training algorithms that use fixed-point arithmetic, which lets us better understand what affects the convergence of these algorithms as parameters scale.
Our methods also generalize naturally to let us prove new convergence bounds on low-precision training with other quantization schemes, such as low-precision floating-point computation and logarithmic quantization.
Keywords: low precision, stochastic gradient descent
TL;DR: we proved dimension-independent bounds for low-precision training algorithms
8 Replies
Loading