Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior

Xiang Li; Dianbo Liu; Kenji Kawaguchi

Initialization is Half the Battle: Generating Diverse Images from a Guidance Potential Posterior

Xiang Li, Dianbo Liu, Kenji Kawaguchi

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 spotlightEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Enhancing generative diversity by sampling initial noise from a guidance potential posterior

Abstract: Despite the remarkable fidelity of generative models, they frequently suffer from mode collapse. Existing strategies for enhancing diversity predominantly focus on intervening during the generation trajectory. We identify a critical oversight that the standard Gaussian initialization often causes trajectories to collapse into dominant modes because it is agnostic to the guidance potential landscape. In this work, we formulate selecting the initial noise from a *guidance potential posterior*, which effectively re-weights the prior towards diversity-rich regions. To sample from this distribution efficiently, we introduce *Diversity-inducing Initialization* (DivIn), which leverages Langevin dynamics to actively navigate the initialization landscape, steering initial noise away from collapsing regions while anchoring them to the valid data manifold. Our method serves as an inference-time diversity enhancement compatible with both diffusion and flow matching models. Extensive experiments show that DivIn exhibits a superior performance in both class-to-image and text-to-image scenarios. Furthermore, we highlight that as DivIn is orthogonal to trajectory-based methods, combining them significantly expands the diversity-quality Pareto frontier beyond what either achieves in isolation.

Lay Summary: AI image generators can create impressive visuals, but they frequently lack true creativity. We discovered a major, overlooked reason for this: the initial random noise often accidentally traps the AI into deep, dominant modes, forcing it to paint similar pictures every time. To solve this, we developed a technique called Diversity-inducing Initialization (DivIn). Instead of relying on a purely random starting point, DivIn actively navigates the generator's mathematical landscape to find "flat valleys" where many different creative paths are possible, gently steering the initial noise away from repetitive traps before the generation process even begins. Because DivIn focuses entirely on preparing the canvas rather than changing how the generator draws, it serves as a simple "plug-and-play" upgrade for modern generative models. It significantly expands the variety of images a generator can produce from a single prompt without sacrificing visual quality, unlocking a new level of creativity for generative AI.

Link To Code: https://github.com/South7X/divin

Primary Area: Probabilistic Methods->Monte Carlo and Sampling Methods

Keywords: Diffusion Model, Flow Matching, Diversity, stochastic, Langevin dynamics

Originally Submitted PDF: pdf

Submission Number: 15608

Loading