Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems

Hao Liang; Shuqing Shi; Yudi Zhang; Biwei Huang; Yali Du

Causality Meets Locality: Provably Generalizable and Scalable Policy Learning for Networked Systems

Hao Liang, Shuqing Shi, Yudi Zhang, Biwei Huang, Yali Du

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent Reinforcement Learning, Networked Systems, Domain Generalization, Domain Adaptation, Causal Structure Learning

TL;DR: We propose a generalizable multi-agent reinforcement learning framework exploiting causal structures, enabling efficient adaptation across networked systems under domain shifts, validated by theoretical guarantees and numerical experiments

Abstract: Large‑scale networked systems, such as traffic, power, and wireless grids, challenge reinforcement‑learning agents with both scale and environment shifts. To address these challenges, we propose \texttt{GSAC} (\textbf{G}eneralizable and \textbf{S}calable \textbf{A}ctor‑\textbf{C}ritic), a framework that couples causal representation learning with meta actor‑critic learning to achieve both scalability and domain generalization. Each agent first learns a sparse local causal mask that provably identifies the minimal neighborhood variables influencing its dynamics, yielding exponentially tight approximately compact representations (ACRs) of state and domain factors. These ACRs bound the error of truncating value functions to $\kappa$-hop neighborhoods, enabling efficient learning on graphs. A meta actor‑critic then trains a shared policy across multiple source domains while conditioning on the compact domain factors; at test time, a few trajectories suffice to estimate the new domain factor and deploy the adapted policy. We establish finite‑sample guarantees on causal recovery, actor-critic convergence, and adaptation gap, and show that \texttt{GSAC} adapts rapidly and significantly outperforms learning-from-scratch and conventional adaptation baselines.

Primary Area: Reinforcement learning (e.g., decision and control, planning, hierarchical RL, robotics)

Submission Number: 24477

Loading