LDT: Layer-Decomposition Training Makes Networks More Generalizable

LDT: Layer-Decomposition Training Makes Networks More Generalizable

ICLR 2026 Conference Submission15088 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Domain generalization

TL;DR: We propose the Layer-Decomposition Training (LDT) strategy, which effectively mitigates feature distribution perturbations caused by misclassified unstable layers in existing methods through layer-wise separation of stable and unstable layers.

Abstract: Domain generalization methods can effectively enhance network performance on test samples with unknown distributions by isolating gradients between unstable and stable parameters. However, existing methods employ relatively coarse-grained partitioning of stable versus unstable parameters, leading to misclassified unstable parameters that degrade network feature processing capabilities. We first provide a theoretical analysis of gradient perturbations caused by unstable parameters. Based on this foundation, we propose Layer-Decomposition Training (LDT), which conducts fine-grained layer-wise partitioning guided by parameter instability levels, substantially improving parameter update stability. Furthermore, to address gradient amplitude disparities within stable layers and unstable layers respectively, we introduce a Dynamic Parameter Update (DPU) strategy that adaptively determines layer-specific update coefficients according to gradient variations, optimizing feature learning efficiency. Extensive experiments across diverse tasks (super-resolution, classification) and architectures (Transformer, Mamba, CNN) demonstrate LDT's superior generalization capability. Our code is available at ***.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 15088

Loading