Understanding and Improving Transfer Learning of Deep Models via Neural Collapse

Xiao Li; Sheng Liu; Jinxin Zhou; Xinyu Lu; Carlos Fernandez-Granda; Zhihui Zhu; Qing Qu

Understanding and Improving Transfer Learning of Deep Models via Neural Collapse

Xiao Li, Sheng Liu, Jinxin Zhou, Xinyu Lu, Carlos Fernandez-Granda, Zhihui Zhu, Qing Qu

Published: 09 Jun 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: With the ever-increasing complexity of large-scale pre-trained models coupled with a shortage of labeled data for downstream training, transfer learning has become the primary approach in many fields, including natural language processing, computer vision, and multi-modal learning. Despite recent progress, the fine-tuning process for large-scale pre-trained models in vision still mostly relies on trial and error. This work investigates the relationship between neural collapse (NC) and transfer learning for classification problems. NC is an intriguing while prevalent phenomenon that has been recently discovered in terms of the final-layer features and linear classifiers of trained neural networks. Specifically, during the terminal phase of training, NC implies that the variability of the features within each class diminishes to zero, while the means of features between classes are maximally and equally distanced. In this work, we examine the NC attributes of pre-trained models on both downstream and training data for transfer learning, and we find strong correlation between feature collapse and downstream performance. In particular, we discovered a systematic pattern that emerges when linear probing pre-trained models on downstream training data: the more feature collapse of pre-trained models on downstream data, the higher the transfer accuracy. Additionally, we also studied the relationship between NC and transfer accuracy on the training data. Moreover, these findings allow us to develop a principled, parameter-efficient fine-tuning method that employs skip-connection to induce the last-layer feature collapse on downstream data. Our proposed fine-tuning methods deliver good performances while reducing fine-tuning parameters by at least 90\% and mitigating overfitting in situations especially when the downstream data is scarce.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Based on the AE and reviewers' suggestions, we have made the following revisions: (1) Relocated the following discussions and empirical results from the Appendix to their appropriate positions within the main body of the paper: (i) NC and direct classifier measurement, (ii) the impact of varying the number of unfrozen layers in layer FT, and (iii) the efficacy of intermediate layers in layer FT. (2) In both Table 1 and throughout the manuscript, we further clarified when and how NC is measured in the relevant sections. (3) Corrected citation usage throughout the manuscript and eliminated as many typos as possible.

Supplementary Material: zip

Assigned Action Editor: ~Simon_Kornblith1

Submission Number: 2042

Loading