Unveiling the Dynamics of Transfer Learning Representations

Published: 02 Mar 2024, Last Modified: 05 May 2024ICLR 2024 Workshop Re-Align PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 9 pages)
Keywords: Transfer learning, cross-domain adaption, representation similarity analysis
Abstract: Representation similarity analysis is used to analyze the dynamics of neural networks. When used to measure the importance of layers in fine-tuning, it is revealed that there is less representation change in early layers than in later layers, which supports freezing early layers during fine-tuning. In this paper, we want to discuss how we can interpret these similarity scores of the representations. We argue that the scalar value of similarity scores between representations of trained and untrained networks should not be interpreted directly. In addition, similarity values obtained by comparing learned representations to their initialized representation should not be compared across layers to judge their importance. Instead, the similarity scores should be proportioned to similar problems to be assessed appropriately. This can be done by a controlled randomization of the dataset, which covers the spectrum from original to random. We find out that the representation change depends on the size of the training data, its structure, and - if pre-trained - how close it is to the pre-trained task. If a dataset does not have a meaningful hierarchical structure, smaller networks tend to \textit{unlearn} the knowledge of the pre-trained network. In contrast, larger networks still use their learned capabilities.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 22
Loading