ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

Rotem Shalev Arkushin; Rinon Gal; Amit H. Bermano; Ohad Fried

ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

Rotem Shalev Arkushin, Rinon Gal, Amit H. Bermano, Ohad Fried

Published: 26 Jan 2026, Last Modified: 11 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: RAG, image generation, rare-concept generation

TL;DR: Rare concept image generation using dynamically retrieved image references.

Abstract: While recent generative models synthesize high-quality visual content, they still struggle with generating rare or fine-grained concepts. To address this challenge, we explore the usage of Retrieval-Augmented Generation (RAG) for image generation, and introduce ImageRAG, a training-free method for rare concept generation. Using a Vision Language Model (VLM), ImageRAG identifies generation gaps between an input prompt and a generated image dynamically, retrieves relevant images, and uses them as context to guide the generation process. Prior approaches that use retrieved images require training models specifically for retrieval-based generation. In contrast, ImageRAG leverages existing image conditioning models, and does not require RAG-specific training. We demonstrate our approach is highly adaptable through evaluation over different backbones, including models trained to receive image inputs and models augmented with a post-training image-prompt adapter. Through extensive quantitative, qualitative, and subjective evaluation, we show that incorporating retrieved references consistently improves the generation abilities of rare and fine-grained concepts across three datasets and three generative models.

Primary Area: generative models

Submission Number: 5918

Loading