GreenLLM: Towards Efficient Large Language Model via Energy-aware Pruning

Chunlin Tian; Xinpeng Qin; Li Li

GreenLLM: Towards Efficient Large Language Model via Energy-aware Pruning

Chunlin Tian, Xinpeng Qin, Li Li

Published: 01 Jan 2024, Last Modified: 07 Dec 2024IWQoS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper proposes GreenLLM, a framework that effectively deploys generative Large Language Models (LLMs) on resource-limited edge devices to well meet the memory and timing constraints with minimized energy consumption. Specifically, GreenLLM employs an energy estimation scheme based on physical hardware to guide a pruning-ratio generator incorporating space, weight, and power (SWaP) constraints for optimal pruning ratio. For each layer, we employ a dependency-aware energy-efficient Pruner in a task-agnostic manner, maximally preserving most of the LLM functionality. Finally, we use downstream datasets to fine-tune the pruned model to recover performance.

Loading