FlexControl: Computation-Aware Conditional Control with Differentiable Router for Text-to-Image Generation
Abstract: Spatial conditioning control offers a powerful way to guide diffusion‐based generative models. Yet, most implementations (e.g., ControlNet) rely on ad-hoc heuristics to choose which network blocks to control — an approach that varies unpredictably with different tasks. To address this gap, we propose FlexControl, a novel framework that equips all diffusion blocks with control signals during training and employs a trainable gating mechanism to dynamically select which control signal to activate at each denoising step. By introducing a computation-aware loss, we can encourage the control signal to activate only when it benefits the generation quality. By eliminating manual control unit selection, FlexControl enhances adaptability across diverse tasks and streamlines the design pipeline with computation-aware training loss in an end-to-end training manner. Through comprehensive experiments on both UNet and DiT architectures on different control methods, we show that our method can upgrade existing controllable generative models in certain key aspects of interest. As evidenced by both quantitative and qualitative evaluations, FlexControl preserves or enhances image fidelity while also reducing computational overhead by selectively activating the most relevant blocks to control. These results underscore the potential of a flexible, data‐driven approach for controlled diffusion and open new avenues for efficient generative model design. The code will soon be available at https://github.com/Daryu-Fan/FlexControl.
Lay Summary: Spatial condition (e.g. depth map, canny edge, sketch) compensates the visual expression ability of text description, and effectively guides the diffusion-based generative model to generate the images imagined by users. Yet, most implementations rely on ad-hoc heuristics to choose which network blocks to inject conditional control, which varies unpredictably with different tasks. To address this gap, we propose FlexControl — a novel framework which automatically adjust the diffusion blocks needed to be controlled depend on specific-sample and timestep.
Specifically, we employ a trainable gating mechanism — router unit, to dynamically select control signals to activate. By introducing a computation-aware loss, we can encourage the control signal to activate only when it benefits the generation quality. By eliminating manual control unit selection, FlexControl enhances adaptability across diverse tasks and streamlines the design pipeline with computation-aware training loss in an end-to-end training manner.
Through comprehensive experiments on different architectures and control methods, we show that our method can upgrade existing controllable generative models in certain key aspects of interest. As evidenced by both quantitative and qualitative evaluations, FlexControl preserves or enhances image fidelity while also reducing computational overhead by selectively activating the most relevant blocks to control. These results underscore the potential of a flexible, data‐driven approach for controlled diffusion and open new avenues for efficient generative model design.
Link To Code: ZmQwZ
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Diffusion model, controllable image generation, dynamic route, data-driven, efficient inference
Submission Number: 3152
Loading