Towards Efficient Pre-training: Exploring FP4 Precision in Large Language Model

Towards Efficient Pre-training: Exploring FP4 Precision in Large Language Model

ACL ARR 2025 February Submission5178 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The increasing computational demands for training large language models (LLMs) necessitate more efficient methods such as quantized training, utilizing low-bit arithmetic operations to reduce costs. While FP8 precision has shown potential, leveraging FP4 remains challenging due to significant quantization errors and limited representation capability. Based on the Transformer model architecture, we present an FP4 training scheme for LLMs, overcoming these obstacles through different quantization approaches for different modules and different training schedule. The framework ensures stability by incorporating mixed-precision training and fine-grained quantization methods. The Transformer's linear are particularly well-suited for this low-precision training approach, facilitating efficient computation and scalability. Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with smaller theoretical computational cost. With the advent of next-generation hardware supporting FP4, our framework sets the foundation for efficient ultra-low precision training.

Paper Type: Short

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: Quantization

Contribution Types: Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 5178

Loading