Keywords: Quantization, LLM training, Optimizer, Memory-efficiency
TL;DR: We propose Q-Adam-mini, a memory-efficient optimizer for LLM training that reduces GPU memory usage by 8× via INT8 momentum quantization while maintaining performance, addressing weight norm explosion with stochastic rounding for scalable training.
Abstract: We propose $\textbf{Q-Adam-mini}$, a memory-efficient optimizer for Large Language Model (LLM) training that achieves $\textbf{8$\times$}$ reduction in GPU memory usage while maintaining performance parity with full-precision AdamW. Building upon Adam-mini, which reduces memory footprint of optimizer states by 50\% compared to AdamW, we further improve memory efficiency through states quantization. We achieve this by: (i) quantizing the first-order momentum ($m$) to $\textbf{INT8}$ and (ii) retaining the second-order momentum ($v$) in $\textbf{FP32}$, which occupies less than 1\% of total memory. However, embedding layer exhibits weight norm instability. We analyze this issue and address it by applying stochastic rounding for momentum quantization exclusively to the embedding layer. We validate our approach on both pre-training and fine-tuning tasks, with the model size ranging from 60M to 8B. Our results demonstrate that Q-Adam-mini enables scalable LLM training with limited computational resources. Codes are available at: https://github.com/LouisCroix/Q-Adam-mini
Submission Number: 67
Loading