Pruning Without Fine-Tuning: Dynamic Pruning of Autoregressive Image Generation Models to Mixtures of Experts

15 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Mixture of Experts, Dynamic Pruning, Autoregressive Image Generation
Abstract: While autoregressive models can achieve state-of-the-art performance in image generation, their massive size poses significant challenges for deployment and efficient model serving. Structural pruning has emerged as an effective method for reducing model size and improving inference efficiency, yet existing approaches need a recovery finetuning due to the sensitivity of image generation to missing parameters. In this work, we propose a novel approach that leverages dynamic pruning to identify and extract sparse experts within dense AR image models, enabling their transformation into Sparse Mixture of Experts (MoE) architectures. Our method efficiently converts pretrained dense models into MoEs, significantly reducing active parameters per inference step while preserving performance without the need for finetuning. By applying top-1 expert routing to MLP layers, we establish a direct link between differentiable dynamic pruning and MoE conversion. Our approach consistently outperforms traditional pruning techniques by maintaining high-generation quality without the need for costly recovery fine-tuning. Experimental comparisons demonstrate the efficiency and effectiveness of our method against static pruning and uncompressed models.
Primary Area: generative models
Submission Number: 6346
Loading