Sustainable Control of Geo-Distributed Datacenters by Distilling Numerical Experts into Adaptive LLM Agents

Antonio Guillen-Perez; Ashwin Ramesh Babu; Sahand Ghorbanpour; Avisek Naug; Vineet Gundecha; Sifat Muhammad Abdullah; Ricardo Luna Gutierrez; Soumyendu Sarkar

Sustainable Control of Geo-Distributed Datacenters by Distilling Numerical Experts into Adaptive LLM Agents

Antonio Guillen-Perez, Ashwin Ramesh Babu, Sahand Ghorbanpour, Avisek Naug, Vineet Gundecha, Sifat Muhammad Abdullah, Ricardo Luna Gutierrez, Soumyendu Sarkar

Published: 30 Oct 2025, Last Modified: 04 Nov 2025MLForSys2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Sustainability, LLM Agents, Policy Distillation, Sustainable Computing, Geo-Distributed Data Centers, Datacenter Orchestration, Optimization, RL, Adaptive Control, Carbon Footprint, Energy

TL;DR: We distill policies from numerical experts into adaptive LLM agents, creating scalable and manageable controllers that can adapt to new operator commands in minutes, not hours/days.

Abstract: The sustainable control of geo-distributed datacenters is a critical systems challenge, defined by large-scale, dynamic, and uncertain operating conditions. While specialized numerical experts, such as those from Reinforcement Learning (RL) or Model Predictive Control (MPC), can be trained to find optimal control policies, their practical deployment is blocked by fundamental systems-level flaws: they are brittle, failing to scale with the system; opaque, preventing operator trust; and rigid, unable to adapt to new runtime objectives. This paper introduces a novel framework that directly addresses these issues by distilling the policy of a numerical expert into an adaptive LLM agent. Our method transforms the expert's opaque logic into a transparent, interactive, and agentic workflow. To validate this approach, we distill a state-of-the-art RL policy for carbon-aware workload orchestration. Evaluated in a high-fidelity simulation, our resulting LLM agent demonstrates the capabilities essential for real-world systems deployment. It solves the scalability problem, successfully managing topologies more than three times larger than the expert's training environment. It enables true runtime adaptability, altering its strategy in minutes in response to complex operator commands that would require days of costly retraining for the original expert. By making powerful optimizers manageable and resilient, our work offers a practical pathway to the sustainable control of large-scale computer systems.

Submission Number: 8

Loading