MultiPersona-Align: Zero-Shot Multi-Subject Personalized Image Generation with Layout-Guidance via Dual Representation Alignment
Track: Proceedings Track
Keywords: representation alignment, personalized image generation
Abstract: We propose MultiPersona-Align, a novel approach for personalized image generation that enhances multi-subject diffusion models through self-supervised feature alignment. While existing methods rely primarily on spatial masking for subject control, they often produce semantically inconsistent features that fail to preserve the subject-specific visual characteristics. Our method introduces Dual Alignment Framework: (1) Spatially-Aligned Subject-Specific Cross-Attention Mechanism that aligns subject-specific diffusion features with corresponding DINOv2 CLS tokens within spatial regions, and (2) Patch-Aligned Self-Attention that ensures global semantic consistency by aligning full-image diffusion features with DINOv2 patch representations. This approach leverages DINOv2's robust semantic understanding without requiring additional training data or annotations. Experiments on multi-subject generation tasks demonstrate that our alignment losses significantly improve subject fidelity and semantic consistency while maintaining spatial control. The method integrates seamlessly into existing architectures, adding minimal computational overhead during training while providing substantial quality improvements in personalized image generation.
Submission Number: 154
Loading