Keywords: pretraining, distribution shift, finetuning, robustness
TL;DR: Many important factors in the pre-training data distribution that may help improve downstream accuracy do not actually affect robustness on downstream tasks.
Abstract: Our work studies the implications of transfer learning on model behavior beyond accuracy: how does the pre-training distribution affect the downstream robustness of a fine-tuned model? We analyze model effective robustness using the framework proposed by Taori et al. (2020), which demonstrates that in-distribution and out-of-distribution performances are highly correlated along a robustness linear trend. We explore various interventions that significantly alter the pre-training distribution, including label space, label semantics, and the pre-training dataset itself. In most cases, changes during pre-training have minimal impact on the original linear trend produced by pre-training models on the full ImageNet dataset. We demonstrate these findings on pre-training distributions constructed from ImageNet and iNaturalist, with the fine-tuning task being iWildCams-WILDS animal classification.