Hybrid Quantum-Classical Policy Gradients for Multi-Agent Reinforcement Learning: A Principled Analysis of Expressivity and Trade-offs

ICLR 2026 Conference Submission14246 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Quantum Computing, Multi-Agent Systems, Reinforcement Learning, Quantum Machine Learning, Representation Learning
Abstract: We conduct a rigorous analysis of hybrid quantum-classical policy gradient methods in multi-agent reinforcement learning (MARL), focusing on a precise characterization of where quantum advantages can arise. We prove that for policy classes exhibiting high correlation—quantified using the standard information-theoretic measure of Total Correlation—quantum variational circuits offer an exponential advantage in representation over standard classical networks. We introduce QC-MAPPO, a hybrid quantum-classical variant of MAPPO, with a complete technical specification. To fairly assess its benefits, we conduct comprehensive experiments on the challenging StarCraft Multi-Agent Challenge (SMAC) benchmark, comparing against both standard and transformer-based classical baselines. The results show that QC-MAPPO achieves statistically significant improvements in sample efficiency and final performance on tasks requiring tight coordination, with the advantage widening as the number of agents increases. We transparently analyze the trade-offs, including the exponential simulation overhead and the role of implicit regularization, providing a principled and sober assessment of the potential for quantum-enhanced MARL.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 14246
Loading