ENERGYLLM-BENCH:AREPRODUCIBLEBENCHMARKFORENERGYAND CARBONFOOTPRINTOFLARGELANGUAGEMODELS

07 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, energy footprint
Abstract: Therapidgrowthoflargelanguagemodels(LLMs)hasraisedurgentconcernsabouttheirenergyfootprintduring trainingandinference.Existingtools,suchasMLPerfandCodeCarbon,provideonlycoarseestimatesandlack reproducibleprotocolsforsystem-levelevaluationofLLMefficiency. WeintroduceEnergyLLM-Bench,anopen-sourceframeworkthatunifiesin-looppowermeasurement,FLOPs basedprediction,andstandardizedJSONLloggingintoasinglereproduciblebenchmark.Allmeasurementsare releasedthroughanextensiblepublicleaderboard,enablingtransparentcomparisonacrossmodels,hardware,and softwareconfigurations. Our evaluation spans dense andmixture-of-experts architectures, CPUs andGPUs, andmultiple opti mizer/precisionsettings.Resultsrevealseveralkeyinsights: (i)scalingGPTmodelsraisesper-tokenenergyby morethan3×;(ii)GPUsconsistentlydeliver4–6×higherinferenceefficiencythanCPUs;(iii)BF16precision reducesenergyconsumptionby10–15%relativetoFP32;and(iv)despitelowerFLOPs,mixture-of-experts modelscanincurorders-of-magnitudehigherrealizedcostsduetoroutingoverhead.FLOPs-basedpredictors, especiallygradientboosting,capturetheseefficiencytrendswithtightererrorboundsthanlinearbaselines. Byconsolidatingprotocol,predictors,andanopenleaderboard,EnergyLLM-Benchestablishesthefirstrepro duciblefoundationforanalyzingtheenergy–qualityfrontierofLLMs.Wehopeitservesasaprincipledtoolfor MLandsystemsresearchersworkingtowardsustainablemodeldesignanddeployment
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 2837
Loading