Automatic Prompt Engineering for Scalable Prompt Inversion in Text-to-Image Ad Generation

Zixin Ding; Qi Zeng; Boying Gong; Wenlong Deng; Bo Pan; Yuxin Chen

Automatic Prompt Engineering for Scalable Prompt Inversion in Text-to-Image Ad Generation

Zixin Ding, Qi Zeng, Boying Gong, Wenlong Deng, Bo Pan, Yuxin Chen

Published: 18 Apr 2026, Last Modified: 24 Apr 2026ACL 2026 Industry Track PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Automatic Prompt Engineering, Prompt Inversion, T2I models

TL;DR: PRISM-DUEL leverages pairwise VLM feedback and dueling-bandit optimization to generate high-fidelity, diverse T2I image variants without requiring model retraining.

Abstract: While prompt engineering offers effective control over Text-to-Image (T2I) generation, it remains labor-intensive for large-scale production. We present PRISM-DUEL, a black-box framework that formalizes prompt optimization as Automatic Prompt Engineering (APE), motivated by advertising workflows requiring low-latency, diverse variants faithful to a human-designed ads. Since zero-shot LLMs are unreliable judges of image quality, PRISM-DUEL obtains label-free pairwise preferences and rationales from an LLM judge over pairs of generated images, then uses a dueling-bandit optimizer to optimize a prompt for generating controlled variations while matching the reference ad's visual content. By iteratively steering the prompt distribution towards higher-quality generations and improving posterior calibration, PRISM-DUEL preserves visual similarity and semantic faithfulness while increasing diversity. Experiments on PartiPrompts and DreamBooth across Gemini 2.5 Flash Image, FLUX.1, and Qwen-Image show consistent gains over strong baselines in visual faithfulness and prompt interpretability.

Submission Type: Emerging

Copyright Form: pdf

Submission Number: 393

Loading