Perturbing BatchNorm and Only BatchNorm Benefits Sharpness-Aware Minimization

Maximilian Mueller; Matthias Hein

Perturbing BatchNorm and Only BatchNorm Benefits Sharpness-Aware Minimization

Maximilian Mueller, Matthias Hein

Published: 20 Oct 2022, Last Modified: 05 May 2023HITY Workshop NeurIPS 2022Readers: Everyone

Keywords: Sharpness-Aware Minimization, Batch Normalization

TL;DR: Perturbing only the BatchNorm parameters in the adversarial step benefits several SAM-variants.

Abstract: We investigate the connection between two popular methods commonly used in training deep neural networks: Sharpness-Aware Minimization (SAM) and Batch Normalization. We find that perturbing \textit{only} the affine BatchNorm parameters in the adversarial step of SAM benefits the generalization performance, while excluding them can decrease the performance strongly. We confirm our results across several models and SAM-variants on CIFAR-10 and CIFAR-100 and show preliminary results for ImageNet. Our results provide a practical tweak for training deep networks, but also cast doubt on the commonly accepted explanation of SAM minimizing a sharpness quantity responsible for generalization.

3 Replies

Loading