Towards Carbon-efficient LLM Life Cycle

Published: 09 Jul 2024, Last Modified: 28 Jan 2026HotCarbon'24EveryoneRevisionsCC BY 4.0
Abstract: With the rise of generative AI, sustainability concerns have intensified due to the computational demands and the need for advanced GPUs. While recent studies have quantified carbon emissions from data centers, a gap exists in fully understanding the lifecycle emissions of generative models and hardware systems. This paper introduces refined carbon models for CPUs and GPUs, aiming to optimize the design space during the machine learning lifecycle, particularly for multi-GPU systems in generative inference. We present a parameterized embodied carbon model that emphasizes the substantial impact of general-purpose CPUs (2x for lifetime). Our findings suggest model-dependent strategies for carbon-efficient generative inference, such as optimized batching, model sharding, and parallelization. These strategies, combined together appropriately, can achieve a 17% improvement in carbon footprint without negligible degradation in throughput. Additionally, we propose an asymmetric lifetime extension strategy for GPUs to amortize CPU embodied carbon, which enhances energy efficiency despite higher initial carbon costs. This approach highlights the potential for sustainable practices in AI, emphasizing the importance of lifecycle-aware optimization in the era of resource-intensive generative models.
Loading