Is Layer Normalization Fine-tuning Sufficient for Visual Distribution Shifts?

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Layer Normalization, Visual foundation models, Fine-tuning
Abstract: Layer Normalization (LayerNorm) is crucial in the functionality of Vision Transformers Foundation models (ViTFs), yet its role in fine-tuning under data scarcity and domain shifts remains underexplored. Our study reveals that LayerNorm parameter shifts (LayerNorm shifts) after fine-tuning are key indicators of a model's adaptation from a source to a target domain. The adaptation's success relies on how well the target domain's true distribution is represented by the training samples. These insights provide a theoretical foundation for connecting LayerNorm shifts with domain shifts. Building on these insights, we introduce the Fine-tuning Shift Ratio (FSR) to quantify representation consistency and propose an innovative rescaling mechanism using a scalar ($\lambda$), inversely related to FSR. This aligns LayerNorm shifts with optimal data representation conditions and includes a cyclic framework to improve fine-tuning. Extensive experiments across various datasets and settings validate our approach. In Out-of-Pretraining (OOP) tasks, lower FSR and higher $\lambda$ highlight under-represented training samples, while ViTFs tuned for In-Pretraining (ID) scenarios favor conservative updates. Our findings illuminate LayerNorm dynamics in transfer learning, offering practical fine-tuning strategies.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 7106
Loading