Keywords: LLM, energy footprint
Abstract: Therapidgrowthoflargelanguagemodels(LLMs)hasraisedurgentconcernsabouttheirenergyfootprintduring
trainingandinference.Existingtools,suchasMLPerfandCodeCarbon,provideonlycoarseestimatesandlack
reproducibleprotocolsforsystem-levelevaluationofLLMefficiency.
WeintroduceEnergyLLM-Bench,anopen-sourceframeworkthatunifiesin-looppowermeasurement,FLOPs
basedprediction,andstandardizedJSONLloggingintoasinglereproduciblebenchmark.Allmeasurementsare
releasedthroughanextensiblepublicleaderboard,enablingtransparentcomparisonacrossmodels,hardware,and
softwareconfigurations.
Our evaluation spans dense andmixture-of-experts architectures, CPUs andGPUs, andmultiple opti
mizer/precisionsettings.Resultsrevealseveralkeyinsights: (i)scalingGPTmodelsraisesper-tokenenergyby
morethan3×;(ii)GPUsconsistentlydeliver4–6×higherinferenceefficiencythanCPUs;(iii)BF16precision
reducesenergyconsumptionby10–15%relativetoFP32;and(iv)despitelowerFLOPs,mixture-of-experts
modelscanincurorders-of-magnitudehigherrealizedcostsduetoroutingoverhead.FLOPs-basedpredictors,
especiallygradientboosting,capturetheseefficiencytrendswithtightererrorboundsthanlinearbaselines.
Byconsolidatingprotocol,predictors,andanopenleaderboard,EnergyLLM-Benchestablishesthefirstrepro
duciblefoundationforanalyzingtheenergy–qualityfrontierofLLMs.Wehopeitservesasaprincipledtoolfor
MLandsystemsresearchersworkingtowardsustainablemodeldesignanddeployment
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 2837
Loading