Clean: Curvature-Aware Memory-Efficient Optimizer via Nyström Sketching

20 Sept 2025 (modified: 03 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: low memory optimizer, low-rank approximation, Nystrom approximation, preconditioner
Abstract: Training large language models is constrained by a trade-off between optimizer memory and curvature information. While memory-saving optimizers often discard valuable second-order information, leading to slower convergence, full-matrix methods are prohibitively expensive. We introduce \textbf{CLEAN}, a curvature-aware and memory-efficient optimizer that resolves this dilemma. \textbf{CLEAN} approximates the left and right gradient covariances using randomized Nyström sketches, enabling balanced, two-sided preconditioning. The optimization proceeds by computing updates within a compact, low-rank subspace and then projecting them back to the full parameter space, capturing rich curvature information at a minimal memory cost. A key innovation in \textbf{CLEAN} is a projection-aware moment transport mechanism. As the low-rank subspace evolves, this transport realigns the optimizer's first and second moments to the new basis, which is critical for maintaining stability and avoiding performance degradation from stale statistics. \textbf{CLEAN}'s memory footprint is orders-of-magnitude smaller than Adam's, growing only linearly with the number of parameters. Our experiments show \textbf{CLEAN} is highly effective for fine-tuning, outperforming strong memory-efficient baselines, while also demonstrating competitive feasibility in pre-training scenarios.
Supplementary Material: pdf
Primary Area: optimization
Submission Number: 23169
Loading