Pre-training Large Language Models with Dynamic Precision: Low-Cost Computation with High-Fidelity Performance

Boao Kong; Weichen Jia; Engao Zhang; Guohong Li; Yonghan Dong; Yao Wang; Yaoyuan Wang; Yunke Peng; Kun Yuan

Pre-training Large Language Models with Dynamic Precision: Low-Cost Computation with High-Fidelity Performance

Boao Kong, Weichen Jia, Engao Zhang, Guohong Li, Yonghan Dong, Yao Wang, Yaoyuan Wang, Yunke Peng, Kun Yuan

Published: 03 Mar 2026, Last Modified: 07 Apr 2026ICLR 2026 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Mixed-precision training, dynamic precision strategy, efficient LLMs pre-training, Gradient norm.

TL;DR: This paper proposes a GNMR-driven dynamic method to stabilize low-precision training and narrow its performance gap with high-precision strategies.

Abstract: Mixed-precision training (MPT) is widely employed in large-scale deep learning tasks to balance computational efficiency and model performance. However, low-precision schemes, including 8-bit and 4-bit, face challenges such as architecture dependence, numerical spikes, and static strategies. To address these issues, this paper proposes the Gradient NorM Ratio (GNMR) metric-driven dynamic perception method. The GNMR index, along with its variant -GNMR index, enables real-time identification of high-risk operators and steps in low-precision training and supports high-precision recovery upon risk detection. We provide a theoretical analysis to validate the proposed dynamic precision strategy. Experimental results on various models demonstrate that the proposed method effectively improves the stability of mixed-precision training and narrows the performance gap between low-precision (4-bit, 8-bit) schemes and popular high-precision training strategies such as BF16.

Submission Number: 75

Loading