LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics

Thomas Robert; Mher Safaryan; Ionut-Vlad Modoranu; Dan Alistarh

LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics

Thomas Robert, Mher Safaryan, Ionut-Vlad Modoranu, Dan Alistarh

Published: 22 Jan 2025, Last Modified: 01 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: adaptive optimization, memory efficiency, low-rank learning, low-rank compression, convergence rates

TL;DR: We propose a new optimizer, LDAdam, that reduces the optimizer's memory footprint by performing low-rank updates while exploring the full parameter space.

Abstract: We introduce LDAdam, a memory-efficient optimizer for training large models, that performs adaptive optimization steps within lower dimensional subspaces, while consistently exploring the full parameter space during training. This strategy keeps the optimizer's memory footprint to a fraction of the model size. LDAdam relies on a new projection-aware update rule for the optimizer states that allows for transitioning between subspaces, i.e., estimation of the statistics of the projected gradients. To mitigate the errors due to low-rank projection, LDAdam integrates a new generalized error feedback mechanism, which explicitly accounts for both gradient and optimizer state compression. We prove the convergence of LDAdam under standard assumptions, and provide empirical evidence that LDAdam allows for efficient fine-tuning and pre-training of language models.

Supplementary Material: zip

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4387

Loading