Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems

Published: 18 Jun 2024, Last Modified: 18 Jul 2024TF2M 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: sharpness aware minimization, implicit regularization, computational efficiency, finetuning, LoRA
TL;DR: We study implicit regularization of sharpness aware minimization, and explicify it for computation and generalization merits.
Abstract: Sharpness-aware minimization (SAM) improves generalization of various deep learning tasks. Motivated by popular architectures such as LoRA, we explore the implicit regularization of SAM for scale-invariant problems involving two groups of variables. Instead of focusing on commonly used `sharpness,' this work introduces a concept termed *balancedness*, defined as the difference between squared norms of two variables. This allows us to depict richer global behaviors of SAM. In particular, our theoretical and empirical findings reveal that i) SAM promotes balancedness; and ii) the regularization on balancedness is *data-responsive* -- outliers have stronger impact. The latter coincides with empirical observations that SAM outperforms SGD in the presence of outliers. Leveraging the implicit regularization, we develop a resource-efficient SAM, balancedness-aware regularization (BAR), tailored for scale-invariant problems such as finetuning language models with LoRA. BAR saves 95% computational overhead of SAM, with enhanced test performance across various tasks on RoBERTa, GPT2, and OPT-1.3B.
Submission Number: 20
Loading