How to fine-tune vision models with SGD

Ananya Kumar; Ruoqi Shen; Sebastien Bubeck; Suriya Gunasekar

How to fine-tune vision models with SGD

Ananya Kumar, Ruoqi Shen, Sebastien Bubeck, Suriya Gunasekar

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: fine-tuning, SGD, freezing layers, distribution shift

TL;DR: SGD can do worse than AdamW under distribution shifts, but simple changes make SGD competitive

Abstract: SGD (with momentum) and AdamW are the two most commonly used optimizers for fine-tuning large neural networks in computer vision. When the two methods perform the same, SGD is preferable because it uses less memory and is more efficient than AdamW. However, when evaluating on downstream tasks that differ significantly from pretraining, we find that across five popular benchmarks SGD fine-tuning gets substantially lower accuracies than AdamW on many modern vision models such as Vision Transformers and ConvNeXts---especially out-of-distribution (OOD). We find that such large gaps arise in instances where the fine-tuning gradients in the first (``embedding'') layer are much larger than the rest of the model. Our analysis suggests an easy fix: if we simply freeze the embedding layer (0.7\% of the parameters), SGD performs competitively with AdamW while using less memory across a suite of benchmarks. Our insights lead to state-of-the-art accuracies on popular distribution shift benchmarks: WILDS-FMoW, WILDS-Camelyon, BREEDS-Living-17, Waterbirds, and DomainNet.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

19 Replies

Loading