Accelerate Diffusion Transformers with Feature Momentum

Accelerate Diffusion Transformers with Feature Momentum

ICLR 2026 Conference Submission13027 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion Models, Model Acceleration, Adaptive Momentum

TL;DR: We propose FeMo, a momentum-based acceleration framework for diffusion models that predicts future features from historical steps, and Adapted-FeMo, which achieving up to 7.1× speedup without sacrificing generation quality.

Abstract: Diffusion models have demonstrated outstanding generative capabilities in image and video synthesis. However, their heavy computational burden, particularly due to the sequential denoising process and large model sizes, makes them challenging to meet real-time application demands. In this paper, motivated by the continuity of diffusion models in the feature space, we introduce FeMo, which employs a momentum mechanism to stabilize the dynamics of diffusion models in different timesteps, allowing us to accurately predict the features in the future timesteps based on the historical information. Additionally, we further propose an Adapted-FeMo, which allows for adaptive searching for the optimal coefficient for each generated sample. Extensive experiments demonstrate its effectiveness, e.g., a 4.99$\times$ acceleration on FLUX with 0.86% improvements on image reward.Under the condition of maintaining generation quality, Adapted-FeMo achieves a maximum speedup of 7.10$\times$ on DiT and 6.24$\times$ on FLUX. Our codes are available in the supplementary material and will be released on Github.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 13027

Loading