Energy-Guided Prompt Optimization for Controllable Cross-Architectural Diffusion Models

16 Sept 2025 (modified: 27 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion-based Generative Models, Controllable Image Synthesis, Energy-Guided Prompt Optimization, Cross-Architectural Attribution, Semantic Consistency Metrics
TL;DR: We propose an energy-guided prompt optimization framework that enhances controllability and constraint adherence across diverse diffusion model architectures.
Abstract: Diffusion models are central to text-to-image synthesis, yet enforcing semantic constraints such as exclusion and negation remains challenging across architectures. We propose a unified, training-free intervention that combines diagnostic instrumentation with a principled sampling-time optimizer to improve constraint adherence without modifying pretrained denoisers. The diagnostic module uses latent attribution and Jacobian analysis to reveal sensitivity to textual conditioning and guide conservative hyperparameter initialization. The optimizer shapes a smooth semantic potential on the CLIP manifold and applies Hamiltonian updates with mild stochasticity, enabling manifold-aware corrections and a distributional interpretation via a semantic Fokker–Planck equation. Experiments on multiple diffusion variants and datasets show that inference-time energy shaping significantly improves negative-prompt compliance while preserving perceptual quality. Our approach advances controllable generation by integrating model introspection and theoretically grounded constrained sampling into a lightweight, architecture-agnostic procedure.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 7245
Loading