JENSEN-SHANNON DIVERGENCE IN SAFE MULTI- AGENT RL

Rushikesh Zawar; Prabhdeep Singh Sethi; Roshan Roy

JENSEN-SHANNON DIVERGENCE IN SAFE MULTI- AGENT RL

Rushikesh Zawar, Prabhdeep Singh Sethi, Roshan Roy

Published: 19 Mar 2024, Last Modified: 29 May 2024Tiny Papers @ ICLR 2024 PresentEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Safe RL, Reinforcement Learning, Jensen-Shannon, KL Divergence

TL;DR: Jensen Shannon divergence due to its symmetric and smoothness, help improve performance in terms of Rewards and costs in multi-agent safe RL.

Abstract: Reinforcement Learning (RL) has achieved significant milestones, however its safety remains a concern for real-world applications. Safe RL solutions focus on maximizing environment rewards while minimizing cost. In this work, we extend the Multi-Agent Constrained Policy Optimisation (MACPO) approach that maintains policy consistency using Kullback-Leibler (KL) divergence. We find that Jensen-Shannon (JS) Divergence, a symmetric measure, serves as a better alternative to KL divergence; its symmetric nature is more forgiving of extreme differences in policies. Our results demonstrate that JS divergence improves rewards and reduces costs, enhancing safety and performance in multi-agent systems.

Submission Number: 162

Loading