Is Layer Normalization Fine-tuning Sufficient for Visual Distribution Shifts?

Zhaorui Tan; Tan Pan; Kaizhu Huang; Weimiao Yu; Kai Yao; Chen Jiang; Qiufeng Wang; Anh Nguyen; Xin Guo; Yuan Cheng; Xi Yang

Is Layer Normalization Fine-tuning Sufficient for Visual Distribution Shifts?

Zhaorui Tan, Tan Pan, Kaizhu Huang, Weimiao Yu, Kai Yao, Chen Jiang, Qiufeng Wang, Anh Nguyen, Xin Guo, Yuan Cheng, Xi Yang

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Layer Normalization, Visual foundation models, Fine-tuning

Abstract: Layer Normalization (LayerNorm) is crucial in the functionality of Vision Transformers Foundation models (ViTFs), yet its role in fine-tuning under data scarcity and domain shifts remains underexplored. Our study reveals that LayerNorm parameter shifts (LayerNorm shifts) after fine-tuning are key indicators of a model's adaptation from a source to a target domain. The adaptation's success relies on how well the target domain's true distribution is represented by the training samples. These insights provide a theoretical foundation for connecting LayerNorm shifts with domain shifts. Building on these insights, we introduce the Fine-tuning Shift Ratio (FSR) to quantify representation consistency and propose an innovative rescaling mechanism using a scalar ($\lambda$), inversely related to FSR. This aligns LayerNorm shifts with optimal data representation conditions and includes a cyclic framework to improve fine-tuning. Extensive experiments across various datasets and settings validate our approach. In Out-of-Pretraining (OOP) tasks, lower FSR and higher $\lambda$ highlight under-represented training samples, while ViTFs tuned for In-Pretraining (ID) scenarios favor conservative updates. Our findings illuminate LayerNorm dynamics in transfer learning, offering practical fine-tuning strategies.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 7106

Loading