A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch SizesDownload PDF

Published: 28 Jan 2022, Last Modified: 04 May 2025ICLR 2022 SubmittedReaders: Everyone
Keywords: neural networks, deep learning, neural network optimization, hyperparameter tuning, optimizer comparison
Abstract: Recently the LARS and LAMB optimizers have been proposed for training neural networks faster using large batch sizes. LARS and LAMB add layer-wise normalization to the update rules of Heavy-ball momentum and Adam, respectively, and have become popular in prominent benchmarks and deep learning libraries. However, without fair comparisons to standard optimizers, it remains an open question whether LARS and LAMB have any benefit over traditional, generic algorithms. In this work we demonstrate that standard optimization algorithms such as Nesterov momentum and Adam can match or exceed the results of LARS and LAMB at large batch sizes. Our results establish new, stronger baselines for future comparisons at these batch sizes and shed light on the difficulties of comparing optimizers for neural network training more generally.
One-sentence Summary: We retune the Nesterov/Adam optimizers on pipelines where LARS/LAMB are commonly used and achieve similar or better performance, providing competitive baselines for the large batch training setting.
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 8 code implementations](https://www.catalyzex.com/paper/a-large-batch-optimizer-reality-check/code)
12 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview