Understanding Sharpness-Aware MinimizationDownload PDF

29 Sept 2021, 00:34 (modified: 23 Nov 2021, 11:05)ICLR 2022 SubmittedReaders: Everyone
Keywords: Sharpness-aware minimization, implicit bias, noisy labels, adversarial training
Abstract: Sharpness-Aware Minimization (SAM) is a recent training method that relies on worst-case weight perturbations. SAM significantly improves generalization in various settings, however, existing justifications for its success do not seem conclusive. First, we analyze the implicit bias of SAM over diagonal linear networks, and prove that it always chooses a solution that enjoys better generalisation properties than standard gradient descent. We also provide a convergence proof of SAM for non-convex objectives when used with stochastic gradients and empirically discuss the convergence and generalization behavior of SAM for deep networks. Next, we discuss why SAM can be helpful in the noisy label setting where we first show that it can help to improve generalization even for linear classifiers. Then we discuss a gradient reweighting interpretation of SAM and show a further beneficial effect of combining SAM with a robust loss. Finally, we draw parallels between overfitting observed in learning with noisy labels and in adversarial training where SAM also improves generalization. This connection suggests that, more generally, techniques from the noisy label literature can be useful to improve robust generalization.
One-sentence Summary: We discuss and study multiple aspects of SAM: its implicit bias, convergence, effect on noisy labels and on robust overfitting
13 Replies