νSAM: Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints

Thomas Pethick; Parameswaran Raman; Lenon Minorics; Mingyi Hong; Shoham Sabach; Volkan Cevher

νSAM: Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints

Thomas Pethick, Parameswaran Raman, Lenon Minorics, Mingyi Hong, Shoham Sabach, Volkan Cevher

Published: 17 Jan 2025, Last Modified: 17 Jan 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Sharpness-aware minimization (SAM) has been shown to improve the generalization of neural networks. However, the method comes at the expense of storing a perturbation of the model parameters, which can be restrictive when memory bound. We design a variant of SAM, called $\nu$SAM, which obtains a low-rank perturbation by modifying the perturbation constraint. The update almost entirely removes the memory footprint of the perturbation without increasing the computational complexity, thus achieving close to a $1/3$ memory saving regarding the parameters when using SGD as the base optimizer. We demonstrate comparable performance of $\nu$SAM with SAM on vision transformers both when training models from scratch and for fine-tuning. Interestingly, $\nu$SAM seems to significantly improve performance for MLP-Mixer architectures across both settings. The results are corroborated theoretically, where we show that SAM with an \emph{arbitrary} norm choice (which includes $\nu$SAM) can converge even with fixed perturbation radius.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Konstantin_Mishchenko1

Submission Number: 3459

Loading