Keywords: Diffusion Models, Generative Models, Personalized Text-to-Image
TL;DR: LayerComposer enables Photoshop-like control for multi-subject text-to-image generation, allowing users to compose scenes by placing, resizing, and locking elements in a layered canvas with high fidelity.
Abstract: Despite their impressive visual fidelity, existing personalized generative models lack interactive control over spatial composition and scale poorly to multiple subjects. To address these limitations, we present \textit{LayerComposer}, an interactive framework for personalized, multi-subject text-to-image generation. Our approach introduces two main contributions: (1) a \textit{layered canvas}, a novel representation in which each subject is placed on a distinct layer, enabling occlusion-free composition; and (2) a \textit{locking mechanism} that preserves selected layers with high fidelity while allowing the remaining layers to adapt flexibly to the surrounding context. Similar to professional image-editing software, the layered canvas allows users to \textit{place}, \textit{resize}, or \textit{lock} input subjects through intuitive layer manipulation. Our versatile locking mechanism requires no architectural changes, relying instead on inherent positional embeddings combined with a complementary data sampling strategy.
Extensive experiments demonstrate that \textit{LayerComposer} achieves superior spatial control and identity preservation compared to the state-of-the-art methods in human-centric personalized image generation.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 3808
Loading