Nested Diffusion Models using Hierarchical Latent Priors

Xiao Zhang; Ruoxi Jiang; Rebecca Willett; Michael Maire

Nested Diffusion Models using Hierarchical Latent Priors

Xiao Zhang, Ruoxi Jiang, Rebecca Willett, Michael Maire

27 Sept 2024 (modified: 14 Oct 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Hierarchical Generative Models, Diffusion Models.

TL;DR: We introduce nested diffusion models that hierarchically generate semantic latent features, leading to a significant improvement in generation quality.

Abstract: We introduce nested diffusion models, an efficient and powerful hierarchical generative framework that substantially enhances the generation quality of diffusion models, particularly for images of complex scenes. Our approach employs a series of diffusion models to progressively generate latent variables at different semantic levels. Each model in this series is conditioned on the output of the preceding higher-level model, culminating in image generation. Hierarchical latent variables guide the generation process along predefined semantic pathways, allowing our approach to capture intricate structural details while significantly improving image quality. To construct these latent variables, we leverage a pre-trained visual encoder, which learns strong semantic visual representations, and apply a series of compression techniques, including spatial pooling, channel reduction, and noise injection, in order to control the information capacity at each level of the hierarchy. Across multiple benchmarks, including class-conditioned generation on ImageNet-1k and text-conditioned generation on the COCO dataset, our system demonstrates notable improvements in image quality, as reflected by FID scores. These improvements incur only slight additional computational cost, as more abstract levels of our hierarchy operate on lower-dimensional representations. Our method also enhances unconditional generation, narrowing the performance gap between conditional generation and unconditional generation that leverages neither text nor class labels.

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9258

Loading