Latent Wavelet Diffusion For Ultra High-Resolution Image Synthesis

ICLR 2026 Conference Submission19510 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative Models, Diffusion Models, Wavelet, Ultra High-Resolution
TL;DR: We enhance Ultra High-Resolution image generation by decomposing latent features into wavelet subbands, allowing the model to focus on frequency-specific refinement during diffusion.
Abstract: High-resolution image synthesis remains a core challenge in generative modeling, particularly in balancing computational efficiency with the preservation of fine-grained visual detail. We present $\textit{Latent Wavelet Diffusion (LWD)}$, a lightweight training framework that significantly improves detail and texture fidelity in ultra-high-resolution (2K-4K) image synthesis. LWD introduces a novel, frequency-aware masking strategy derived from wavelet energy maps, which dynamically focuses the training process on detail-rich regions of the latent space. This is complemented by a scale-consistent VAE objective to ensure high spectral fidelity. The primary advantage of our approach is its efficiency: LWD requires no architectural modifications and adds zero additional cost during inference, making it a practical solution for scaling existing models. Across multiple strong baselines, LWD consistently improves perceptual quality and FID scores, demonstrating the power of signal-driven supervision as a principled and efficient path toward high-resolution generative modeling.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 19510
Loading