Abstract: Transfer learning is a powerful technique that enables model training with limited amounts of data, making it crucial in many data-scarce real-world applications. Typically, transfer learning protocols require first to transfer all the feature-extractor layers of a network pre-trained on a data-rich source task, and then to adapt only the task-specific readout layers to a data-poor target task. This workflow is based on two main assumptions: first, the feature maps of the pre-trained model are qualitatively similar to the ones that would have been learned with enough data on the target task; second, the source representations of the last hidden layers are always the most expressive. In this work, we demonstrate that this is not always the case and that the largest performance gain may be achieved when smaller portions of the pre-trained network are transferred. In particular, we perform a set of numerical experiments in a controlled setting, showing how the optimal transfer depth depends non-trivially on the amount of available training data and on the degree of source-target task similarity, and it is often convenient to transfer only the first layers. We then propose a strategy to detect the most promising source task among the available candidates. This approach compares the internal representations of a network trained entirely from scratch on the target task with those of the networks pre-trained on the potential source tasks.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Hanie_Sedghi1
Submission Number: 2385
Loading