ALPS: Adaptive LLM Pruning via Gradient Search in Learned Representation Space

Chunlin Tian; Zhiyuan Ning; Xinpeng Qin; Yiming Luo; KaHou Tam; Yebo Wu; Yuanchun Zhou; Li Li

ALPS: Adaptive LLM Pruning via Gradient Search in Learned Representation Space

Chunlin Tian, Zhiyuan Ning, Xinpeng Qin, Yiming Luo, KaHou Tam, Yebo Wu, Yuanchun Zhou, Li Li

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Adaptive Pruning, Gradient-steered Search

TL;DR: An adaptive LLM pruning framework that uses an encoder-evaluator-decoder approach to optimize pruning for runtime adaptability.

Abstract: Deploying Large Language Models (LLMs) at the edge is crucial for data privacy and offline operation, yet their massive parameter count poses significant resource challenges. While existing methods rely on discrete-space heuristics to search for pruning configurations, we introduce a fundamentally different approach: reformulating the search for optimal LLM pruning configurations as gradient optimization in a learned continuous representation space. Our method, ALPS (Adaptive Layer Pruning via Search), embeds discrete pruning configurations into a continuous space where efficient gradient-based optimization becomes possible, then decodes optimal representations back to implementable discrete pruning schemes. This encoder-evaluator-decoder architecture automatically learns from collected “pruning-score" data pairs, eliminating manual tuning while jointly optimizing for model performance, latency, and energy consumption in a deployment-specific manner. Extensive experiments across Llama-7B, Llama2-7B, Llama2-13B, and Vicuna-7B demonstrate ALPS's superiority, achieving up to 34.1% energy reduction and 33.5\% lower latency while maintaining over 91% of original performance. At high pruning ratios (50%), ALPS consistently outperforms state-of-the-art methods in both perplexity and downstream task accuracy.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 23899

Loading