Conceptrol: Concept Control of Zero-shot Personalized Image Generation

Qiyuan He; Angela Yao

Conceptrol: Concept Control of Zero-shot Personalized Image Generation

Qiyuan He, Angela Yao

07 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: text-to-image generation, diffusion model, subject-driven generation

TL;DR: Conceptrol is a free lunch that elicits the personalized ability of zero-shot adapter by transforming image condition to visual specification contrained by textual concept, even outperforming fine-tuning methods.

Abstract: Personalized image generation with text-to-image diffusion models generates unseen images based on reference image content. Zero-shot adapter methods such as IP-Adapter and OminiControl are especially interesting because they do not require test-time fine-tuning. However, they struggle to balance preserving personalized content and adherence to the text prompt. We identify a critical design flaw resulting in this performance gap: current adapters inadequately integrate reference images with the textual descriptions. The generated images, therefore, tend to replicate the reference or misunderstand the personalized target. Yet the base text-to-image has strong conceptual understanding capabilities that can be leveraged. We propose Conceptrol, a simple yet effective framework that enhances zero-shot adapters without adding computational overhead. Conceptrol constrains the attention of visual specification with a textual concept mask that improves subject-driven generation capabilities. It achieves as much as 89\% improvement on personalization benchmarks over the vanilla IP-Adapter and can even outperform fine-tuning approaches such as Dreambooth LoRA.

Primary Area: generative models

Submission Number: 2812

Loading