FreeFuse: Multi-Subject LoRA Fusion via Adaptive Token-Level Routing at Test Time

TMLR Paper9019 Authors

18 May 2026 (modified: 01 Jun 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: This paper proposes FreeFuse, a training-free framework for multi-subject text-to-image generation through automatic fusion of multiple subject LoRAs. In contrast to prior studies that focus on retraining LoRAs to alleviate feature conflicts, our analysis reveals that spatially confining each subject LoRA's output to its target region and preventing other LoRAs from intruding into this area is sufficient for effective mitigation. Accordingly, we implement Adaptive Token-Level Routing during the inference phase. However, obtaining reliable routing regions remains challenging. Existing methods that rely on text-image latent association, such as raw cross-attention or concept-level similarity matching, often suffer from sparse activations, hole artifacts, and unstable localization when handling visually similar subjects, leading to incomplete or ambiguous subject masks. To address these issues, we introduce FreeFuseAttn, a mechanism that exploits the flow matching model's intrinsic semantic alignment to dynamically match subject-specific tokens to their corresponding spatial regions at early denoising timesteps, thereby bypassing the need for external segmentors. FreeFuse distinguishes itself through high practicality: it necessitates no additional training, model modifications, or user-defined spatial constraints. Users need only provide subject activation words to achieve seamless integration into standard workflows. Extensive experiments validate that FreeFuse outperforms existing approaches in both identity preservation and compositional fidelity. Our code is available at \url{https://anonymous.4open.science/r/FreeFuse_anno-FC99}.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yuhang_Zang1
Submission Number: 9019
Loading