Keywords: Quantization
Abstract: Finding optimal datatype for neural networks is a non-trivial problem with exponential search space.
To solve the problem of quantization effectively, we consider pseudo-quantization training (PQT) on microscaling (MX) datatypes.
Specifically, we propose pseudo-quantization noise (PQN) based on $R\approx\lfloor\mathcal N(0,1)/2\rceil$.
It allows PQT to (1) optimize on the floating-point (FP) bit configuration, (2) help preserve dynamic range of original data, and (3) generate noise $R$ efficiently.
We demonstrate that the proposed method allows for stable and efficient pre-training of the GPT2 and Llama2 language models up to 1 billion (B) parameters for up to 295B tokens, with insights on optimal FP datatypes for model parameters.
Supplementary Material: zip
Primary Area: infrastructure, software libraries, hardware, systems, etc.
Submission Number: 15379
Loading