The Role of Pre-training Data in Transfer Learning

Rahim Entezari; Mitchell Wortsman; Olga Saukh; M. Moein Shariatnia; Hanie Sedghi; Ludwig Schmidt

The Role of Pre-training Data in Transfer Learning

Rahim Entezari, Mitchell Wortsman, Olga Saukh, M. Moein Shariatnia, Hanie Sedghi, Ludwig Schmidt

Published: 01 Feb 2023, Last Modified: 22 Jun 2025Submitted to ICLR 2023Readers: Everyone

Keywords: pretraining, transfer learning, supervised training, constrastive learning, clip, simclr

TL;DR: We investigate the role of pretraining distribution, data curation, size, and loss and downstream transfer learning

Abstract: The transfer learning paradigm of model pre-training and subsequent fine-tuning produces high accuracy models. However, a question remains: what data and method should be used for pre-training? We study the effect of the pre-training distribution on transfer learning in the context of image classification. Through controlled experiments, we find that the pre-training dataset is initially important for low-shot transfer. However, the differences between distributions is diminished as more data is made available for fine-tuning. Still, fine-tuning outperforms training from scratch. We also investigate dataset size and observe that larger pre-training datasets lead to better accuracy, however, the absolute accuracy difference is largest in the few-shot regime. Beyond data, we study the effect of the pre-training method, language-image contrastive vs. image-image contrastive, finding that the latter usually leads to better transfer accuracy

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/the-role-of-pre-training-data-in-transfer/code)

20 Replies

Loading