Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size

Kento Imaizumi; Hideaki Iiduka

Both Asymptotic and Non-Asymptotic Convergence of Quasi-Hyperbolic Momentum using Increasing Batch Size

Kento Imaizumi, Hideaki Iiduka

Published: 01 Sept 2025, Last Modified: 18 Nov 2025ACML 2025 Conference TrackEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Momentum methods were originally introduced for their superiority to stochastic gradient descent (SGD) in deterministic settings with convex objective functions. However, despite their widespread application to deep neural networks — a representative case of stochastic nonconvex optimization — the theoretical justification for their effectiveness in such settings remains limited. Quasi-hyperbolic momentum (QHM) is an algorithm that generalizes various momentum methods and has been studied to better understand the class of momentum-based algorithms as a whole. In this paper, we provide both asymptotic and non-asymptotic convergence results for mini-batch QHM with an increasing batch size. We show that achieving asymptotic convergence requires either a decaying learning rate or an increasing batch size. Since a decaying learning rate adversely affects non-asymptotic convergence, we demonstrate that using mini-batch QHM with an increasing batch size — without decaying the learning rate — can be a more effective strategy. Our experiments show that even a finite increase in batch size can provide benefits for training neural networks. The code is available at https://github.com/iiduka-researches/qhm_acml25.

Supplementary Material: zip

Submission Number: 51

Loading