ENERGYLLM-BENCH:AREPRODUCIBLEBENCHMARKFORENERGYAND  CARBONFOOTPRINTOFLARGELANGUAGEMODELS

youla yang

ENERGYLLM-BENCH:AREPRODUCIBLEBENCHMARKFORENERGYAND CARBONFOOTPRINTOFLARGELANGUAGEMODELS

youla yang

07 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, energy footprint

Abstract: Therapidgrowthoflargelanguagemodels(LLMs)hasraisedurgentconcernsabouttheirenergyfootprintduring trainingandinference.Existingtools,suchasMLPerfandCodeCarbon,provideonlycoarseestimatesandlack reproducibleprotocolsforsystem-levelevaluationofLLMefficiency. WeintroduceEnergyLLM-Bench,anopen-sourceframeworkthatunifiesin-looppowermeasurement,FLOPs basedprediction,andstandardizedJSONLloggingintoasinglereproduciblebenchmark.Allmeasurementsare releasedthroughanextensiblepublicleaderboard,enablingtransparentcomparisonacrossmodels,hardware,and softwareconfigurations. Our evaluation spans dense andmixture-of-experts architectures, CPUs andGPUs, andmultiple opti mizer/precisionsettings.Resultsrevealseveralkeyinsights: (i)scalingGPTmodelsraisesper-tokenenergyby morethan3×;(ii)GPUsconsistentlydeliver4–6×higherinferenceefficiencythanCPUs;(iii)BF16precision reducesenergyconsumptionby10–15%relativetoFP32;and(iv)despitelowerFLOPs,mixture-of-experts modelscanincurorders-of-magnitudehigherrealizedcostsduetoroutingoverhead.FLOPs-basedpredictors, especiallygradientboosting,capturetheseefficiencytrendswithtightererrorboundsthanlinearbaselines. Byconsolidatingprotocol,predictors,andanopenleaderboard,EnergyLLM-Benchestablishesthefirstrepro duciblefoundationforanalyzingtheenergy–qualityfrontierofLLMs.Wehopeitservesasaprincipledtoolfor MLandsystemsresearchersworkingtowardsustainablemodeldesignanddeployment

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 2837

Loading