Scalable Transfer Learning with Expert Models

Joan Puigcerver; Carlos Riquelme Ruiz; Basil Mustafa; Cedric Renggli; André Susano Pinto; Sylvain Gelly; Daniel Keysers; Neil Houlsby

Scalable Transfer Learning with Expert Models

Joan Puigcerver, Carlos Riquelme Ruiz, Basil Mustafa, Cedric Renggli, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby

Published: 12 Jan 2021, Last Modified: 05 May 2023ICLR 2021 PosterReaders: Everyone

Keywords: Transfer Learning, Expert Models, Few Shot

Abstract: Transfer of pre-trained representations can improve sample efficiency and reduce computational requirements for new tasks. However, representations used for transfer are usually generic, and are not tailored to a particular distribution of downstream tasks. We explore the use of expert representations for transfer with a simple, yet effective, strategy. We train a diverse set of experts by exploiting existing label structures, and use cheap-to-compute performance proxies to select the relevant expert for each target task. This strategy scales the process of transferring to new tasks, since it does not revisit the pre-training data during transfer. Accordingly, it requires little extra compute per target task, and results in a speed-up of 2-3 orders of magnitude compared to competing approaches. Further, we provide an adapter-based architecture able to compress many experts into a single model. We evaluate our approach on two different data sources and demonstrate that it outperforms baselines on over 20 diverse vision tasks in both cases.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10), [ImageNet](https://paperswithcode.com/dataset/imagenet), [Stanford Cars](https://paperswithcode.com/dataset/stanford-cars)

14 Replies

Loading