IPGO: Indirect Prompt Gradient Optimization for Text-to-Image Model Prompt Finetuning

ICLR 2026 Conference Submission14286 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: T2I Diffusion Model, human feedback alignment, parameter-efficient
TL;DR: We propose IPGO, a new parameter-efficient framework for image optimization on T2I diffusion models.
Abstract: Text-to-Image (T2I) Diffusion models have become the state-of-the-art for image generation, yet they often fail to align with specific reward criteria such as aesthetics or human preference. We propose Indirect Prompt Gradient Optimization (IPGO), a novel and parameter-efficient framework that enhances prompt embeddings by injecting a few learnable text embeddings as prefix and suffix around the original prompt embeddings. IPGO leverages low-rank approximation and rotation, while enforcing range, orthonormality, and conformity to ensure stability. We evaluate IPGO against six baseline methods under prompt-wise training with three reward models targeting image aesthetics, image-text alignment, and human preferences across three datasets of varying prompt complexity. The results show that, despite using only a single NVIDIA L4 GPU and over 250 times fewer parameters, IPGO consistently outperforms all baselines over strong competitors such as DRaFT-1 and TextCraftor. Ablation studies further highlight the contributions of each IPGO component and optimization constraint, while additional experiments demonstrate IPGO's adaptability across various T2I diffusion models.
Primary Area: generative models
Submission Number: 14286
Loading