Prism: A Composable Pe\underlinerson Image Synthesis Model with Compositional Consistency and Unified Optimization

Prism: A Composable Pe\underlinerson Image Synthesis Model with Compositional Consistency and Unified Optimization

ICLR 2026 Conference Submission17435 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: image generation

Abstract: While multi-subject reference generation has witnessed rapid advancements, conditional image generation focusing on human-environment interaction, particularly person-centric multi-conditional generation, has received comparatively less attention. This domain encompasses multi-subject referencing, portrait synthesis, and scene guidance. To address this gap, we introduce Prism, a unified architecture designed to generate coherent images that satisfy all input conditions, even in the absence of textual prompts. Prism excels at maintaining identity and facial characteristics while aligning with specified backgrounds. Addressing the scarcity of aligned reference and target image sets, we developed a dedicated pipeline, termed {HMS-Dataset}, to construct a large-scale training dataset from single images containing individuals. Building upon this, Prism first encodes facial identity, pertinent clothing elements, and background context into sequences. These sequences are subsequently fused via a novel MM-Attention mechanism. Furthermore, we propose a Compositional Consistency Losses (CCL) strategy to incorporate facial similarity, clothing feature preservation, and background consistency, which are specifically designed to boost facial fidelity, retain intricate clothing details, and enhance overall background coherence. Subsequently, guided by the Minimum Variance Distortionless Response criterion, we propose a Unified Gradient Optimization (UGO) update strategy, which enables fair perceptual optimization for multi-objective optimization problems. Ultimately, Prism demonstrates robust identity preservation and seamless human-environment interaction. Evaluated on our proposed PrismBench, Prism achieves state-of-the-art fidelity and controllability, significantly advancing practical applications in character editing and customizable scene synthesis.

Primary Area: generative models

Submission Number: 17435

Loading