Generative Modeling with Explicit Memory

Generative Modeling with Explicit Memory

ICLR 2026 Conference Submission12784 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion models, efficiency, memorization

TL;DR: We propose a novel diffusion model framework that incorporates explicit memory mechanism into the diffusion modeling which accelerates training by over 50 times on ImageNet 256x256.

Abstract: Conditional diffusion models require external guidance for generation, but common signals like text prompts are often noisy, necessitating prolonged training on massive, high-quality paired datasets. To address this, we introduce Generative Modeling with Explicit Memory (GMem), a framework that instead conditions generation on high-quality semantic information extracted directly from the data themselves. Such conditioning is stored in an external memory bank, providing an accurate guidance signal that can accelerate training by a large margin. Our experiments on ImageNet $256\times 256$ show that \method achieves a $50\times$ training speedup over SiT while also reaching a state-of-the-art (SoTA) FID of $1.53$. The key contributions of our work are threefold: (i) We demonstrate significant training acceleration on ImageNet datasets. (ii) We propose an efficient downstream adaptation pathway, where the image-pretrained model serves as a base model for adapting to new tasks. (iii) We introduce a data- and compute-efficient text-to-image (T2I) pipeline that matches the quality of strong baselines like PixelArt-$\alpha$ using only $\frac{1}{17}$ of the data and $\frac{1}{9}$ of the training time. Our work establishes conditioning with explicit memory as a powerful paradigm for efficient and effective generative modeling. Our code will be made publicly available.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 12784

Loading