FreeFuse: Multi-Subject LoRA Fusion via Auto Masking at Test Time

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: text-to-image generation, low-rank adaptation, multi-concept generation
Abstract: This paper proposes FreeFuse, a novel training-free approach for multi-subject text-to-image generation through automatic fusion of multiple subject LoRAs. In contrast to existing methods that either focus on pre-inference LoRA weight merging or rely on segmentation models and complex techniques like noise blending to isolate LoRA outputs, our key insight is that context-aware dynamic subject masks can be automatically derived from cross-attention layer weights. Our analysis shows that constraining each LoRA’s influence to its corresponding subject region via these masks effectively mitigates feature conflicts between LoRAs. FreeFuse demonstrates superior practicality and efficiency as it requires no additional training, no modification to LoRAs, no auxiliary models, and no user-defined prompt templates or region specifications. Alternatively, it only requires users to provide the LoRA activation words for seamless integration into standard workflows. Extensive experiments validate that FreeFuse outperforms existing approaches in both generation quality and usability under the multi-subject generation tasks. We will release the source code upon the official publication of the paper.
Primary Area: generative models
Submission Number: 8708
Loading