DISCO: DISCrete nOise for Conditional Control in Text-to-Image Diffusion Models

Longquan Dai; Wu Ming; Dejiao Xue; He Wang; Jinhui Tang

DISCO: DISCrete nOise for Conditional Control in Text-to-Image Diffusion Models

Longquan Dai, Wu Ming, Dejiao Xue, He Wang, Jinhui Tang

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Models, Conditional Control

Abstract: A major challenge in using diffusion models is aligning outputs with user-defined conditions. Existing conditional generation methods fall into two major categories: classifier-based guidance, which requires differentiable target models and gradient-based correction; and classifier-free guidance, which embeds conditions directly into the diffusion model but demands expensive joint training and architectural coupling. In this work, we introduce a third paradigm: DISCrete nOise (DISCO) guidance, which replaces the continuous conditional correction term with a finite codebook of discrete noise vectors sampled from a Gaussian prior. Conditional generation is reformulated as a code selection task, and we train prediction network to choose the optimal code given the intermediate diffusion state and the conditioning input. Our approach is differentiability-free, and training-efficient, avoiding the gradient computation and architectural redundancy of prior methods. Empirical results demonstrate that DISCO achieves competitive controllability while substantially reducing resource demands, positioning it as a scalable and effective alternative for conditional diffusion generation.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 20537

Loading