SemCache:Adaptive Semantic-Aware Caching for Efficient Video Diffusion

Zhenxue Wang; Yufan Liu; Fengjuan Wang; Congyan Lang; Wenyang Luo; Jinming Lou; Yuming Li

SemCache:Adaptive Semantic-Aware Caching for Efficient Video Diffusion

Zhenxue Wang, Yufan Liu, Fengjuan Wang, Congyan Lang, Wenyang Luo, Jinming Lou, Yuming Li

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Video diffusion model acceleration ;Video generation ;Diffusion transformers

Abstract: Diffusion models have achieved significant progress in video generation tasks, but slow inference speed remains a major challenge. Existing cache-based acceleration methods for video diffusion have demonstrated considerable improvements in inference speed. An existing efficient caching strategy involves reusing model outputs by estimating and leveraging the fluctuating differences among model outputs across timesteps. However, the strategy relies on extensive calibration sets and neglects the fact that prompt semantic variations affect the variational differences in model outputs. This phenomenon is illustrated by comparing videos generated from different prompts: while "a horse running on the grassland" produces highly dynamic content, "a person reading in a coffee shop" results in relatively static scenes. Building on this observation, we propose a novel training-free SemCache method that can adaptively adjust caching strategies by perceiving prompt semantics changes. Key innovations include a Prompt Semantic-Aware (PSA) caching that evaluates prompt semantics and then dynamically decides a caching strategy tailored to the current timestep based on semantic information. We further introduce a Temporal Motion Metric (TMM) scheme to guide the compute allocation along the temporal dimension based on motion information, which not only ensures motion consistency in videos but also further reduces inference time. Experimental results demonstrate that SemCache achieves 2.45× and 2.66× speedups on HunyuanVideo and Wan2.1 respectively, while maintaining high video quality. Our code will be made publicly available.

Primary Area: generative models

Submission Number: 11218

Loading