Keywords: Sustainability, Green, Transformers, Energy, FLOPs, Profiling, Amplification, LLMs, repeated, sampling, attention, mlp, feedforward, inference, hooks, activations
TL;DR: A novel methodology to measure the Energy consumed by smaller Transformer components and an empirical analysis across models demonstrating different proportions of per flop energy consumption across components.
Abstract: The rapid adoption of Large Language Models (LLMs) has raised significant environmental concerns. Unlike the one-time cost of training, LLM inference occurs continuously at a global scale and now dominates the AI energy footprint. Yet, most sustainability studies report only coarse, model-level metrics due to the lack of fine-grained measurement methods, treating energy efficiency more as an afterthought than as a primary objective. We present the first fine-grained empirical analysis of inference energy across core components of transformer architecture. We propose a novel methodology, Component-Level Energy Assessment via Repeated sampling (CLEAR), to overcome temporal mismatch between microsecond($\mu$ s) scale component execution and monitoring of millisecond(ms) scale energy sensors. Using CLEAR, we evaluate 15 models spanning four distinct architecture types and consistently keep component-wise energy variance below 9.5% while capturing more than 90% of the model’s total energy as individual components. Our empirical analysis reveals that Attention blocks consume significantly more energy per floating-point operation (FLOP), indicating that energy consumption is not proportionally aligned with FLOP counts. This shows that FLOPs alone fail to capture the true energy cost at a component level. Our findings establish detailed component-level energy baselines and provide insight as an initial step to build energy-efficient transformer models through component-level optimizations.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 20277
Loading