Damon: Dynamic model pruning for Dense Large Language models

Jiateng Wei; Huan Wang

Damon: Dynamic model pruning for Dense Large Language models

Jiateng Wei, Huan Wang

02 Sept 2025 (modified: 25 Sept 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLMs, structured pruning, dynamic sparsity

Abstract: With a vast number of parameters, Large Language Models (LLMs) have shown great potential across a wide range of tasks. To reduce model size and accelerate inference, structured pruning is a widely adopted technique. However, conventional structured pruning, as a static technique, permanently discards model components, leading to a significant and irreversible degradation in performance. To address this limitation, we propose a **D**yn**a**mic **mo**del pru**n**ing framework for dense LLMs. Our approach employs token-level routers to selectively activate a subset of model structures for each forward pass, rather than permanently removing unactivated ones. This mechanism enables the dynamic allocation of the computational budget according to input difficulty, striking an effective trade-off between performance and efficiency. Extensive experiments on four families of LLMs demonstrate that our method outperforms both static and dynamic structured pruning baselines under the same computational budgets.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: true

Submission Guidelines: true

Anonymous Url: true

No Acknowledgement Section: true

Submission Number: 1031

Loading