Track: Regular paper
Keywords: text-to-image generation, diffusion models, prompt optimization, controllable diversity, novelty search, evolutionary algorithms, large language models, creative AI, semantic prompt mutation, CLIP embeddings, image diversity, generative creativity
TL;DR: WANDER is a novelty search framework that leverages LLM-guided prompt evolution and controllable emitters to systematically generate semantically coherent yet diverse image sets, supporting creative exploration with diffusion models.
Abstract: Text-to-image diffusion models are capable of producing diverse outputs, yet discovering prompts that consistently elicit diversity remains challenging. Re-using or lightly editing a prompt often produces near duplicate generations, limiting their utility for exploration and ideation. We present WANDER, a novelty search-based framework that evolves prompts to produce diverse image sets from a single input. WANDER uses a Large Language Model (LLM) to mutate prompts, guided by semantic “emitters” such as altering style, composition, or atmosphere, Image novelty is quantified using CLIP embeddings, ensuring that each generation expands the diversity of the pool while remaining semantically coherent. Experiments with FLUX-DEV for generation and GPT-4o-mini for mutation show that WANDER produces significantly more diverse image sets than existing prompt optimization baselines, while using fewer tokens. Ablations highlight that emitter-guided control is essential for achieving diversity. By framing diversity as a controllable property, WANDER offers a practical, scalable tool for creative exploration with generative models.
Submission Number: 26
Loading