Keywords: Layer Normalization, Visual foundation models, Fine-tuning
Abstract: Layer Normalization (LayerNorm) is crucial in the functionality of Vision Transformers Foundation models (ViTFs), yet its role in fine-tuning under data scarcity and domain shifts remains underexplored. Our study reveals that LayerNorm parameter shifts (LayerNorm shifts) after fine-tuning are key indicators of a model's adaptation from a source to a target domain. The adaptation's success relies on how well the target domain's true distribution is represented by the training samples. These insights provide a theoretical foundation for connecting LayerNorm shifts with domain shifts.
Building on these insights, we introduce the Fine-tuning Shift Ratio (FSR) to quantify representation consistency and propose an innovative rescaling mechanism using a scalar ($\lambda$), inversely related to FSR.
This aligns LayerNorm shifts with optimal data representation conditions and includes a cyclic framework to improve fine-tuning.
Extensive experiments across various datasets and settings validate our approach. In Out-of-Pretraining (OOP) tasks, lower FSR and higher $\lambda$ highlight under-represented training samples, while ViTFs tuned for In-Pretraining (ID) scenarios favor conservative updates. Our findings illuminate LayerNorm dynamics in transfer learning, offering practical fine-tuning strategies.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 7106
Loading