JENSEN-SHANNON DIVERGENCE IN SAFE MULTI- AGENT RL

Published: 19 Mar 2024, Last Modified: 29 May 2024Tiny Papers @ ICLR 2024 PresentEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Safe RL, Reinforcement Learning, Jensen-Shannon, KL Divergence
TL;DR: Jensen Shannon divergence due to its symmetric and smoothness, help improve performance in terms of Rewards and costs in multi-agent safe RL.
Abstract: Reinforcement Learning (RL) has achieved significant milestones, however its safety remains a concern for real-world applications. Safe RL solutions focus on maximizing environment rewards while minimizing cost. In this work, we extend the Multi-Agent Constrained Policy Optimisation (MACPO) approach that maintains policy consistency using Kullback-Leibler (KL) divergence. We find that Jensen-Shannon (JS) Divergence, a symmetric measure, serves as a better alternative to KL divergence; its symmetric nature is more forgiving of extreme differences in policies. Our results demonstrate that JS divergence improves rewards and reduces costs, enhancing safety and performance in multi-agent systems.
Submission Number: 162
Loading