Prompt Optimizer for Text-to-Image Generation: Utilizing Chain-of-Thought Reasoning to Optimize Prompt Design
Abstract: Text-to-image generation models have attracted a lot of attention because of their ability to create images from text prompts. However, natural language prompts are often concise and ambiguous, making it difficult to consistently produce high-quality images that meet user expectations. In this work, we investigate the capabilities of large language models in image generation and introduce a method, Prompt Optimizer, which utilizes large language models for prompt augmentation. Using the Pick-a-Pic and CoCo datasets, our experiments employ an improved aesthetic predictor and PickScore as evaluation metrics to evaluate image quality and text-image relevance. Compared to direct generation and other text-to-image prompt generation methods, our method has seen significant improvements in relevance and generation quality.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: cross-modal content generation
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Submission Number: 1802
Loading