Keywords: Text-to-Image Generation, Diffusion Models, LoRA, Multi-Character Generation, Modular Inference, Concept Vanishing, Scene Coherence
Abstract: Low-Rank Adaptation (LoRA) provides a lightweight and flexible approach for personalising diffusion models with high-fidelity characters. Yet extending LoRA to multi-character generation remains difficult: fusion-based methods require recomputing merged adapters for each character set, while non-LoRA-fusion approaches, despite avoiding image-level conditions such as pose guidance or edge maps, degrade rapidly beyond four characters, leading to scene incoherence, character vanishing, and character blending. These limitations highlight a fundamental gap: current pipelines cannot reliably scale to complex multi-character scenes while maintaining efficiency and visual quality. To address this gap, we present MC-LoRA, an inference-time framework that scales multi-character generation without retraining. MC-LoRA introduces two innovations: (i) an attention-weighted injection mechanism that balances contributions across adapters to preserve global coherence; and (ii) a dual-loss guidance scheme combining Character Balancing Loss to prevent vanishing and Spatial Localisation Loss to suppress blending. Experiments on prompts with up to eight characters show that MC-LoRA significantly outperforms LoRA-Composer, improving ImageReward from 0.046 to 0.395 during complex scenes and reducing sampling time by more than 2×. These results establish MC-LoRA as an efficient and robust solution for scalable multi-character personalisation.
Primary Area: generative models
Submission Number: 22233
Loading