Periodical Moving Average Accelerates Gradient Accumulation for Post-Training

Yumou Liu; An Li; Chaojie Li; Fei Yu; Benyou Wang

Periodical Moving Average Accelerates Gradient Accumulation for Post-Training

Yumou Liu, An Li, Chaojie Li, Fei Yu, Benyou Wang

Published: 07 May 2025, Last Modified: 13 Jun 2025UAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Convex Optimization, Non-Convex Optimization, Large Language Models, Memory-Efficient Training

TL;DR: Periodical Moving Average is an extension for momentum-based optimizers to accelerate post-training on memory-limited devices.

Abstract: High gradient variance presents a significant obstacle to efficient post-training of large language models (LLMs) on memory-constrained devices. Existing practical strategies—such as reducing batch sizes or adopting gradient accumulation (GA)—suffer from an inherent trade-off: smaller batches exacerbate convergence issues due to increased gradient noise, while GA substantially prolongs training time owing to its sequential processing. In this work, we reveal that the Exponential Moving Average (EMA) in momentum-based optimizers exponentially discounts historical gradients, thereby limiting their effectiveness in stabilizing parameter updates, especially during post-training when parameter drift is minimal. Motivated by this, we propose integrating the core idea of GA directly into momentum updates via a novel Periodical Moving Average (PMA) mechanism, which structures training into fixed periods and replaces EMA with a uniform moving average within each period. We instantiate PMA within AdamW and Lion, resulting in the AdamW-PMA and Lion-PMA optimizers. Theoretical analysis establishes that AdamW-PMA matches the convergence guarantees of standard Adam. Extensive empirical evaluation on supervised fine-tuning and direct preference optimization tasks demonstrates that PMA-based methods achieve approximately $2\times$ faster training compared to GA, while yielding consistently better performance on downstream evaluations.

Supplementary Material: zip

Latex Source Code: zip

Code Link: https://github.com/liuyumou/periodical-moving-average.git

Signed PMLR Licence Agreement: pdf

Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission237/Authors, auai.org/UAI/2025/Conference/Submission237/Reproducibility_Reviewers

Submission Number: 237

Loading