Robustness to Adversarial Gradients: A Glimpse Into the Loss Landscape of Contrastive Pre-trainingDownload PDF

26 May 2022, 20:09 (modified: 23 Jul 2022, 02:25)ICML 2022 Pre-training WorkshopReaders: Everyone
Keywords: contrastive, flatness, loss-landscape, curvature, pre-training, sharpness, minima
TL;DR: We analyze the loss landscape of contrastive trained models with a computationally efficient measure of sharpness, and find that SimCLR pre-training results in flatter optima.
Abstract: An in-depth understanding of deep neural network generalization can allow machine learning practitioners to design systems more robust to class balance shift, adversarial attacks, and data drift. However, the reasons for better generalization are not fully understood. Recent works provide empirical arguments suggesting flat minima generalize better. While recently proposed contrastive pre-training methods have also been shown to improve generalization, there is an incomplete understanding of the loss landscape of these models and why they generalize well. In this work, we analyze the loss landscape of contrastive trained models on the CIFAR10 dataset by looking at three sharpness measures: (1) the approximate eigenspectrum of the Hessian, (2) (Cε, A)-sharpness, and (3) robustness to adversarial gradients (RAG), a new efficient measure of sharpness. Our findings suggest models fine-tuned after contrastive training favor flatter solutions relative to baseline classifiers trained with a supervised objective. In addition, our proposed metric yields findings consistent with existing works, demonstrating impacts of learning rate and batch size on minima sharpness.
0 Replies