Abstract: The increasing computational demands for training large language models (LLMs) necessitate more efficient methods such as quantized training, utilizing low-bit arithmetic operations to reduce costs. While FP8 precision has shown potential, leveraging FP4 remains challenging due to significant quantization errors and limited representation capability. Based on the Transformer model architecture, we present an FP4 training scheme for LLMs, overcoming these obstacles through different quantization approaches for different modules and different training schedule. The framework ensures stability by incorporating mixed-precision training and fine-grained quantization methods. The Transformer's linear are particularly well-suited for this low-precision training approach, facilitating efficient computation and scalability.
Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with smaller theoretical computational cost. With the advent of next-generation hardware supporting FP4, our framework sets the foundation for efficient ultra-low precision training.
Paper Type: Short
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Quantization
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 5178
Loading