MultiRef: Controllable Image Generation with Multiple Visual References

Published: 06 May 2025, Last Modified: 06 May 2025SynData4CVEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Controllable image generation, multi-images-to-image, unified models, Benchmark, Dataset
TL;DR: We introduce MultiRef, a new task and benchmark for controllable image generation from multiple visual references, revealing that current models struggle to effectively combine diverse visual inputs.
Abstract: Visual designers naturally draw inspiration from multiple visual references, combining diverse elements and aesthetic principles to create artwork. However, current image generative frameworks predominantly rely on single-source inputs --- either text prompts or individual reference images. In this paper, we present a new task called MultiRef, which focuses on controllable image generation using multiple visual references. To support this task, we further introduce \benchmark, a rigorous evaluation framework comprising 990 synthetic and 1,000 real-world generation samples that require incorporating visual content from multiple reference images. The synthetic samples are synthetically generated through our data engine, with 10 reference types and 32 reference combinations. For assessment, we integrate both rule-based metrics and a fine-tuned MLLM-as-a-Judge model into MultiRef-Bench. Our experiments across three interleaved image-text models (i.e., OmniGen, ACE, and Show-o) and six agentic frameworks (e.g., ChatDiT and LLM + SD) reveal that even state-of-the-art systems struggle with multi-reference conditioning, with the best model OmniGen achieving only 66.6% in synthetic samples and 79.0% in real-world cases on average comparing to golden answer. These findings provide valuable directions for developing more flexible and human-like creative tools that can effectively integrate multiple sources of visual inspiration.
Submission Number: 18
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview