Diffusion on Demand: Selective Caching and Modulation for Efficient Generation

Hee Min Choi; Hyoa Kang; Dokwan Oh; Nam Ik Cho

Diffusion on Demand: Selective Caching and Modulation for Efficient Generation

Hee Min Choi, Hyoa Kang, Dokwan Oh, Nam Ik Cho

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

Keywords: Generative models, diffusion, inference acceleration

TL;DR: Diffusion on Demand

Abstract: Diffusion transformers demonstrate significant potential for various generation tasks but are challenged by high computational cost. Recently, feature caching methods have been introduced to improve inference efficiency by storing features at certain timesteps and reusing them at subsequent timesteps. However, their effectiveness is limited as they rely only on choosing between cached features and performing model inference. Motivated by high cosine similarity between features across consecutive timesteps, we propose a cache-based framework that reuses features and selectively adapts them through linear modulation. In our framework, the selection is performed via a modulation gate, and both the gate and modulation parameters are learned. Extensive experiments show that our method achieves similar generation performance to the original sampler while requiring significantly less computation. For example, FLOPs and inference latency are reduced by $2.93\times$ and $2.15\times$ for DiT-XL/2 and by $2.83\times$ and $1.50\times$ for PixArt-$\alpha$, respectively. We find that modulation is effective when applied to as little as 2\% of layers, resulting in negligible computation overhead.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 19855

Loading