One-for-All: Towrads Human-Centric Multi-Subject Customization from Single-Subject Examples

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Human Image Generation, Multi-Subject Customization, Diffusion Models
Abstract: Human-centric multi-subject customization remains a key challenge in the field of subject-driven image synthesis. A primary obstacle lies in the curation of paired multi-subject data, which is labor-intensive and often introduces subject inconsistencies that hinder effective model learning. In this paper, we introduce One-for-All, a framework that pioneers a new paradigm by learning multi-subject consistency from only real-world, single-subject examples, breaking the dependency on curated multi-subject data. Building upon this, we unlock the full potential of this paradigm shift by introducing two key designs that ensure robust multi-subject consistency. Firstly, a Center-Aligned Cross-Modal Position Association module is proposed to guide the interaction between visual references and their textual descriptions. This interaction facilitates intra-subject semantic grounding among cross-modal conditions and improves their synergistic contributions for subject consistency. Secondly, to alleviate the attention dilution caused by the increased tokens of multiple subjects, a Dynamic Attention Modulation mechanism is introduced. This design maintains multi-subject consistency by dynamically predicting and applying token-wise attention weights, ensuring focus remains on critical features. Comprehensive experiments demonstrate that our method, even trained exclusively on single-subject data, exhibits robust generalization across varying numbers of reference subjects, and surpasses all baseline methods trained on curated multi-subject data pairs.
Primary Area: generative models
Submission Number: 5821
Loading