Atrous Learning for Diffusion Models

03 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: generative models, flow matching, locality
TL;DR: We propose a simple masking strategy to improve the contextual representation in diffusion models.
Abstract: Diffusion models have shown remarkable success across a wide range of generative tasks. However, they often suffer from spatially inconsistent generation, arguably due to the inherent locality of their denoising mechanisms. For example, a diffusion model trained on natural images might generate hands with six fingers. To mitigate this issue, we propose atrous learning for diffusion models, a simple yet effective masking strategy that can be implemented with only a few lines of code. Experiments show that it is surprisingly safe to mask up to 98\% of pixels for diffusion model training. Our method attains competitive FID scores across datasets and avoids training instability on small datasets. Moreover, the masking strategy reduces memorization and promotes the use of broader contextual information during generation.
Primary Area: generative models
Submission Number: 1695
Loading