Keywords: Text-to-image generation, Diffusion Models, Object Quantification, Prompt Optimization
TL;DR: We propose QuantiShift, a shift-aware prompting framework that adapts to different shift types without diffusion model retraining.
Abstract: Accurately quantifying objects with text-to-image diffusion models remains challenging, especially under distribution shifts. Existing methods struggle to maintain numerical precision across varying object categories, count distributions, and visual domains. To address this, we propose QuantiShift, a shift-aware prompting framework that adapts to different shift types without diffusion model retraining. QuantiShift introduces shift-aware prompt optimization, where distinct prompt components explicitly tackle number shifts, label shifts, and covariate shifts, ensuring precise object quantification across varying distributions. To further enhance generalization, we propose consistency-guided any-shift prompting, which enforces alignment between textual prompts and generated images by mitigating inconsistencies caused by distribution shifts. Finally, we develop hierarchical prompt optimization, a two-stage refinement process that first adapts prompts to individual shifts and then calibrates them for cross-shift generalization. To evaluate robustness, we introduce a new benchmark designed to assess object quantification under diverse shifts. Extensive experiments demonstrate that QuantiShift achieves state-of-the-art performance, considerably improving accuracy and robustness over existing methods.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 2968
Loading