Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization

Utku Evci; Vincent Dumoulin; Hugo Larochelle; Michael Curtis Mozer

Head2Toe: Utilizing Intermediate Representations for Better OOD Generalization

Utku Evci, Vincent Dumoulin, Hugo Larochelle, Michael Curtis Mozer

Published: 28 Jan 2022, Last Modified: 22 Jun 2025ICLR 2022 SubmittedReaders: Everyone

Keywords: efficient training, transfer learning, efficient transfer, fine tuning, computer vision, linear probe

Abstract: Transfer-learning methods aim to improve performance in a data-scarce target domain using a model pretrained on a data-rich source domain. A cost-efficient strategy, linear probing, involves freezing the source model and training a new classification head for the target domain. This strategy is outperformed by a more costly but state-of-the-art method---fine-tuning all parameters of the source model to the target domain---possibly because fine-tuning allows the model to leverage useful information from intermediate layers which is otherwise discarded by the later pretrained layers. We explore the hypothesis that these intermediate layers might be directly exploited by linear probing. We propose a method, Head-to-Toe probing (Head2Toe), that selects features from all layers of the source model to train a classification head for the target-domain. In evaluations on the VTAB, Head2Toe matches performance obtained with fine-tuning on average, but critically, for out-of-distribution transfer, Head2Toe outperforms fine-tuning.

One-sentence Summary: We propose a method, Head-to-Toe probing (Head2Toe), that selects features from all layers of a pretrained source model in order to achieve better out of distribution generalization.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/head2toe-utilizing-intermediate/code)

28 Replies

Loading