Keywords: Diffusion Models, Curriculum Training, Training Dynamics, Feature Learning, Generalization, Theory of Deep Learning
Abstract: Training a diffusion model (DM) amounts to solving a continuum of denoising subproblems whose difficulty varies with the noise level. The default practice samples these subproblems uniformly and updates all parameters jointly, which entangles feature representations and degrades generation. Recent empirical work suggests that presenting subproblems in an easy-to-hard order helps, yet why it helps and whether further gains are possible remain open.
We propose a curriculum-based framework that schedules two ingredients during optimization: the difficulty of the denoising subproblems, and the share of parameters that is allowed to evolve. Training starts from easier subproblems on a subset of trainable neurons; harder subproblems are introduced later, with the remaining neurons gradually unlocked. The reserved capacity protects subtle, low-amplitude features from being overwritten while the network is still fitting coarse structure.
We provide the first analysis connecting this curriculum to the training dynamics and generalization error of DMs, identifying a coarse-to-fine learning order that emerges implicitly under our schedule. Experiments on multiple datasets and architectures confirm the predicted gains over uniform-noise training.
Submission Number: 150
Loading