Abstract: Modern text-to-image generation models are capable of producing realistic and high-quality images. However, user prompts often contain ambiguities, making it difficult for these systems to interpret users' actual intentions. Consequently, users often need to modify their prompts several times to ensure the generated images meet their expectations.
Although some previous works aim to refine prompts for generating images that align with user requirements, comprehending the true needs of users, particularly non-expert individuals, remains a challenge for the model.
In this research, we aim to enhance the visual parameter-tuning process, making the model user-friendly for individuals without specialized knowledge and it can better understand user needs.
We propose a human-machine co-adaption strategy by maximizing the mutual information between the user's prompts and the pictures under modification as the optimizing target in order to make the system better adapt to user needs. We find that an improved model can reduce the necessity for multiple rounds of adjustments. We also collect multi-round dialogue datasets with prompts and images pairs and user intent. Various experiments demonstrate the effectiveness of the proposed method in our proposed dataset.
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: User-Friendly Image Generation,Human-Machine Co-Adaptation,Visual Parameter-Tuning Optimization
Contribution Types: NLP engineering experiment
Languages Studied: python
Submission Number: 1487
Loading