IPGO: Indirect Prompt Gradient Optimization for Text-to-Image Model Prompt Finetuning

ICLR 2026 Conference Submission14286 Authors

18 Sept 2025 (modified: 03 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: T2I Diffusion Model, human feedback alignment, parameter-efficient
TL;DR: We propose IPGO, a new parameter-efficient framework for image optimization on T2I diffusion models.
Abstract: Text-to-Image (T2I) Diffusion models have become the state-of-the-art for image generation, yet they often fail to align with specific reward criteria such as aesthetics or human preference. We propose Indirect Prompt Gradient Optimization (IPGO), a novel model-independent framework that enhances prompt embeddings by injecting learnable text embeddings as prefix and suffix around the original prompt embeddings. IPGO leverages reduced-rank approximation and rotation, while enforcing range and orthonormality constraints and a conformity penalty to ensure stability and mitigate reward hacking. We evaluate IPGO against six baseline methods under single-prompt optimization with three reward models targeting image-text alignment, image aesthetics, and human preferences across three datasets of varying prompt complexity. The results show that IPGO consistently outperforms all baselines over strong competitors such as DRaFT-1 and TextCraftor. We investigate and mitigate reward hacking, in particular for aesthetics. Ablation studies further highlight the contributions of each IPGO component, while additional experiments demonstrate IPGO's application to various T2I diffusion models.
Primary Area: generative models
Submission Number: 14286
Loading