Personalizing text-to-image generation with visual prompts using BLIP-2Download PDF

02 Apr 2023 (modified: 15 Jun 2023)KAIST Spring2023 AI618 SubmissionReaders: Everyone
Keywords: Text-to-Image Generation, Text-guided synthesis, Personalization
Abstract: In recent years, text-to-image generation has received significant attention as researchers aim to automatically generate realistic images from textual descriptions. Despite the promising results of diffusion models in producing high-quality images, they often struggle to capture the richness and diversity of natural language expressions, making it difficult to generate images that align with user intentions. To tackle this challenge, personalization has been proposed as a potential solution. Personalization involves fine-tuning pre-trained text-to-image generation models using user-provided images that contain specific concepts or subjects, enabling the models to generate images with the desired subject or style. However, existing personalization techniques typically require direct fine-tuning of the text encoder, diffusion model, or both, which can result in high computational costs for incorporating user-provided concepts and potentially compromise the model's knowledge. This paper introduces a novel approach to personalizing a text-to-image model by leveraging a BLIP-2 encoder. We provide the image that contains objects we wish to generate using the Stable Diffusion model as inputs to the BLIP-2 encoder. Then, we use the output queries of the BLIP-2 Q-former as visual prompts to guide the Stable Diffusion model to generate images that capture the visual representations of the input image.
0 Replies

Loading