Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization

Analyzing the Training Dynamics of Image Restoration Transformers: A Revisit to Layer Normalization

ICLR 2026 Conference Submission14905 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Low-level vision, image restoration, network architecture, normalization

TL;DR: This work reveals and analyzes extreme feature statistics in image restoration transformers raised by LayerNorm, and provides a simple drop-in replacement.

Abstract: This work analyzes the training dynamics of Image Restoration (IR) Transformers and uncovers a critical yet overlooked issue: conventional LayerNorm (LN) drives feature magnitudes to diverge to a million scale and collapses channel-wise entropy. We analyze this in the perspective of networks attempting to bypass LayerNorm’s constraints, which conflict with IR tasks. Accordingly, we address two misalignments: 1) per-token normalization that disrupts spatial correlations, and 2) input-independent scaling that discards input-specific statistics. To address this, we propose Image Restoration Transformer Tailored Layer Normalization (i-LN), a simple drop-in replacement that normalizes features holistically and adaptively rescales them per input. We provide theoretical insights and empirical evidence that this design effectively captures important spatial correlations and better preserves input-specific statistics throughout the network. Experimental results verify that the proposed i-LN consistently outperforms vanilla LN on various IR tasks.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 14905

Loading