Arbitrary-Order Block SignSGD for Memory-Efficient LLM Fine-Tuning

Published: 26 Jan 2026, Last Modified: 01 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Block-Coordinate Optimization, SignSGD, Large Language Models (LLMs), Memory-Efficient Fine-Tuning
Abstract: We propose \textbf{ABSignSGD}, a block‑coordinate variant of sign-based descent with flexible block selection that enables memory‑ and runtime‑efficient full‑parameter fine‑tuning of large language models. We present a unified convergence analysis under mild conditions, covering both the base method and a \textit{majority‑vote} extension for distributed training. The latter improves communication efficiency by aggregating only gradient signs rather than averaging full gradients. Experiments on \textcolor{blue}{Qwen3‑8B, Llama3-8B, and Qwen3-32B}, spanning mathematical reasoning and general instruction‑following tasks, show that ABSignSGD converges faster per iteration and delivers superior downstream performance while reducing both runtime and memory usage compared to existing methods. Ablation studies further indicate that the memoryless sign-based update naturally complements block‑wise updates, explaining the method’s strong empirical performance.
Primary Area: optimization
Submission Number: 10064
Loading