OUSAC: Optimized gUidance Scheduling with Adaptive Caching for DiT Acceleration

ICLR 2026 Conference Submission13434 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Acceleration, Diffusion transformers, Cache, CFG
Abstract: Diffusion models have emerged as the dominant paradigm for high-quality image generation, yet their computational expense remains substantial due to iterative denoising. Classifier-Free Guidance (CFG) significantly enhances generation quality and controllability but doubles the computation size by requiring both conditional and unconditional forward passes at every timestep. We present OUSAC ($\textbf{O}ptimized$ $g\textbf{U}idance$ $\textbf{S}cheduling$ with $\textbf{A}daptive$ $\textbf{C}aching)$, a framework that accelerates diffusion transformers (DiT) through systematic optimization. We begin with two key observations that reveal acceleration opportunities: first, the importance of guidance varies dramatically across timesteps -- while a few critical steps require strong guidance, most steps need minimal or even no guidance; second, variable guidance patterns introduce denoising deviations that undermine the standard caching methods, which assume constant CFG scales and future similarity across steps. Moreover, different transformer blocks are affected with different levels under dynamic conditions. This paper develop a two stage approach leveraging these insights. Stage-1 employs evolutionary algorithms to discover sparse guidance schedules that apply CFG only at critical timesteps, which eliminates up to 82\% of unconditional passes. Stage-2 introduces an adaptive rank allocation strategy that tailors calibration efforts per transformer block, maintaing caching effectiveness under variable guidance. Experiments demonstrate that OUSAC significantly outperforms the state-of-the-art acceleration methods. Specifically, it achieves 53\% computational savings and a 15\% improvement in generation quality on DiT-XL/2 (ImageNet 512$\times$512), as well as 60\% savings with 16.1\% quality improvement on PixArt-$\alpha$ (MSCOCO) .
Primary Area: generative models
Submission Number: 13434
Loading