Keywords: Trustworthy AI, Text-to-image generation, Retrieval-augmented generation, AI safety, Image editing
TL;DR: We define hallucination in text-to-image generation, which generates images different from reality, and propose a retrieval-based methodology to address this issue.
Abstract: Text-to-image generation has shown remarkable progress due to the emergence of diffusion models. However, these models fail to reflect factual information and common sense inherent in the input text prompts, leading to the generation of factually inconsistent images. We define this issue as ‘Image hallucination’. We categorize this problem into three types based on the study of hallucinations in language models and propose a methodology that uses factual images retrieved from external memory to generate realistic images. Depending on the target of the hallucination, we utilize either InstructPix2Pix or IP-Adapter, each method employing factual information from the retrieved factual images differently. This allows us to generate images that accurately reflect the facts and common sense contained in the input text prompts.
Submission Number: 8
Loading