DiffBlender: Composable and versatile multimodal text-to-image diffusion models

Published: 01 Jan 2026, Last Modified: 10 Nov 2025Expert Syst. Appl. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Introduces DiffBlender to unify multiple input modalities—structure, layout, and attribute—within a single T2I framework.•Utilizes a compact “Blender block” that preserves the pre-trained diffusion parameters, minimizing additional training overhead.•Enables efficient multimodal generation and composability across diverse conditions and user preferences.•Proposes mode-specific guidance for precise control over each modality, ensuring balanced and high-fidelity image synthesis.
Loading