Sustainable AI: Efficient Pruning of Large Language Models in Resource-Limited Environments

Published: 09 Oct 2024, Last Modified: 19 Nov 2024Compression Workshop @ NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models (LLMs), Weight Evaluation, Model Pruning, Sustainable AI, Performance Optimization
TL;DR: Efficiently pruning large language models to enhance performance and sustainability in GPU-scarce environments.
Abstract: The rapid growth and deployment of large language models (LLMs) like ChatGPT have revolutionized artificial intelligence, particularly in natural language processing, but they come with significant computational and environmental costs, including high energy consumption and carbon emissions. Addressing these challenges, our research introduces novel pruning techniques—"evolution of weights" and "smart pruning"—to enhance the efficiency of deep neural networks, especially on embedded devices. By systematically evaluating the importance of individual parameters during training, our methods achieve higher compression rates and faster computations while preserving accuracy, outperforming traditional pruning approaches. Extensive experiments with both scaled-down and larger multimodal LLMs demonstrate that moderate pruning can improve efficiency and reduce resource consumption with minimal accuracy loss, though excessive pruning can degrade performance. Our LLM experiment, available on GitHub, underscores the critical need for optimized AI models that balance technological advancement with ecological sustainability.
Submission Number: 2
Loading