CarbonGearRL: Precision-Elastic, Carbon-Aware Scheduling for Foundation-Model Training

Published: 11 Jun 2025, Last Modified: 10 Jul 2025ES-FoMo IIIEveryoneRevisionsBibTeXCC BY 4.0
Keywords: carbon-aware training, mixed precision, reinforcement learning, foundation models, energy efficiency
TL;DR: We train big language models with half the CO$_2$ by letting an RL agent dial GPU count \emph{and} precision up or down based on the live carbon intensity of the power grid.
Abstract: The carbon footprint of training large language models now rivals that of entire data centres, yet most optimisation efforts treat accelerator count and numeric precision as static hyper\-parameters. We introduce \textbf{CarbonGearRL}, an end-to-end system that \emph{jointly} schedules cluster width and arithmetic precision against real-time grid carbon signals. A dual-driven soft Q-learning scheduler scales GPUs up to FP8 during low-carbon windows and down to BF16 when emissions peak, while a precision-adaptive AdamW provides provable stability under stochastic quantisation noise. We derive sublinear carbon regret relative to a clairvoyant oracle and match the $\mathcal{O}(1/\sqrt{B})$ convergence rate of fixed-precision baselines. On 13 B/70 B LLaMA-style models our prototype cuts CO$_2$-e by up to 52 \% without throughput loss.
Submission Number: 106
Loading