Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer

Yanjun Zhao; Sizhe Dang; Haishan Ye; Guang Dai; Yi Qian; Ivor Tsang

Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer

Yanjun Zhao, Sizhe Dang, Haishan Ye, Guang Dai, Yi Qian, Ivor Tsang

Published: 22 Jan 2025, Last Modified: 14 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM; deep learning; zeroth order optimizer

TL;DR: We propose HiZOO to leverage the diagonal Hessian to enhance ZOO for fine-tuning LLMs.

Abstract: Fine-tuning large language models (LLMs) is necessary for specific downstream tasks, but classic first-order optimizer entails prohibitive GPU memory because of the back propagation. Recent works such as MeZO have turned to zeroth-order optimizers for fine-tuning, which reduce substantial memory by using two forward passes. However, heterogeneous curvatures across different parameter dimensions in LLMs often cause model convergence instability or even failure. In this work, we propose HiZOO, a diagonal Hessian informed Zeroth-Order Optimizer , which is the first work to leverage the diagonal Hessian to enhance ZOO for fine-tuning LLMs. We provide theoretical proof for HiZOO and visualize the optimization trajectories on test functions to illustrate how it improves convergence in handling heterogeneous curvatures. Extensive experiments on various models (RoBERTa, OPT, Phi-2 and LLama3, with 350M$\sim$66B parameters) indicate that HiZOO significantly reduces training steps and enhances model accuracy, while keeping the memory advantage of ZOO. For example, on SST2 task HiZOO achieves $8\times$ speedup and better accuracy over MeZO across different models. We also propose HiZOO-L, which reduces the Hessian memory cost to 10\% of the MeZO, while maintaining almost same performance. Compared with ZO-Adam, HiZOO-L achieves a 4.3\% improvement, just using 50\% of the GPU memory. Code is available at https://anonymous.4open.science/r/HiZOO-27F8.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 8732

Loading