ALTER: All-in-One Layer Pruning and Temporal Expert Routing for Efficient Diffusion Generation

Xiaomeng Yang; Lei Lu; Qihui Fan; Changdi Yang; Juyi Lin; Yanzhi Wang; Xuan Zhang; Shangqian Gao

ALTER: All-in-One Layer Pruning and Temporal Expert Routing for Efficient Diffusion Generation

Xiaomeng Yang, Lei Lu, Qihui Fan, Changdi Yang, Juyi Lin, Yanzhi Wang, Xuan Zhang, Shangqian Gao

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion, Pruning, Mixture of Experts, Efficiency

TL;DR: An all-in-one framework that significantly speeds up diffusion models by dynamically creating pruned temporal experts during finetuning.

Abstract: Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images. However, their iterative denoising process results in significant computational overhead during inference, limiting their practical deployment in resource-constrained environments. Existing acceleration methods often adopt uniform strategies that fail to capture the temporal variations during diffusion generation, while the commonly adopted sequential $\textit{pruning-then-fine-tuning strategy}$ suffers from sub-optimality due to the misalignment between pruning decisions made on pretrained weights and the model’s final parameters. To address these limitations, we introduce $\textbf{ALTER}$: $\textbf{A}$ll-in-One $\textbf{L}$ayer Pruning and $\textbf{T}$emporal $\textbf{E}$xpoert $\textbf{R}$outing, a unified framework that transforms diffusion models into a mixture of efficient temporal experts. ALTER achieves a single-stage optimization that unifies layer pruning, expert routing, and model fine-tuning by employing a trainable hypernetwork, which dynamically generates layer pruning decisions and manages timestep routing to specialized, pruned expert sub-networks throughout the ongoing fine-tuning of the UNet. This unified co-optimization strategy enables significant efficiency gains while preserving high generative quality. Specifically, ALTER achieves same-level visual fidelity to the original 50-step Stable Diffusion v2.1 model while utilizing only 25.9\% of its total MACs with just 20 inference steps and delivering a 3.64$\times$ speedup through 35\% sparsity.

Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)

Submission Number: 7665

Loading