Interpretable LLM Control for Sustainable Liquid Cooling in HPC Data Centers

Published: 01 Jul 2025, Last Modified: 08 Jul 2025CO-BUILD PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM in Sustainability, Sustainability, Interpretability, Data Centers, Liquid Cooling, LLM control, LLM-RL Hybrid Controller, Energy Efficiency, Real-Time Systems
TL;DR: We present an interpretable multi-agent system combining LLMs and RL to optimize liquid cooling in data centers for sustainability, achieving energy savings and improved reliability.
Abstract: The rise of AI workloads has driven the need for efficient liquid cooling in high-density data centers, yet current systems lack intelligent, interpretable control. We propose a novel framework combining Reinforcement Learning (RL) with Large Language Models (LLMs) to optimize end-to-end liquid cooling, from server cabinets to the cooling towers, while providing natural language explanations for control actions. Our approach includes a hybrid of a multi-agent Reinforcement Learning and a Large Language Model controller. Evaluated on a baseline of Oak Ridge National Lab's Frontier Supercomputer based scalable liquid cooling Modelica model, it improves temperature stability and energy efficiency, offering a scalable and transparent solution for sustainable data center cooling.
Submission Number: 2
Loading