An Empirical Study of Pre-trained Vision Models on Out-of-distribution GeneralizationDownload PDF

09 Oct 2021, 14:49 (modified: 01 Dec 2021, 10:06)NeurIPS 2021 Workshop DistShift PosterReaders: Everyone
Keywords: out-of-distribution generalization, domain generalization, pre-training, fine-tuning
TL;DR: We show that larger models and larger datasets need to be simultaneously leveraged to improve OOD performance.
Abstract: Generalizing to out-of-distribution (OOD) data -- that is, data from domains unseen during training -- is a key challenge in modern machine learning, which has only recently received much attention. Some existing approaches propose leveraging larger models and pre-training on larger datasets. In this paper, we provide new insights in applying these approaches. Concretely, we show that larger models and larger datasets need to be simultaneously leveraged to improve OOD performance on image classification. Moreover, we show that using smaller learning rates during fine-tuning is critical to achieving good results, contrary to popular intuition that larger learning rates generalize better when training from scratch. We show that strategies that improve in-distribution accuracy may, counter-intuitively, lead to poor OOD performance despite strong in-distribution performance. Our insights culminate to a method that achieves state-of-the-art results on a number of OOD generalization benchmark tasks, often by a significant margin.
1 Reply

Loading