SDM-RL: Steady-State Divergence Maximization for Robust Reinforcement Learning

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Diverse reinforcement learning, robust reinforcement learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: In this work we propose a new effective diversity measure to induce different behavioral policies in reinforcement learning
Abstract: While reinforcement learning algorithms have achieved human-level performance in complex scenarios, they often falter when subjected to perturbations in test environments. Previous attempts to mitigate this issue have explored the training of multiple policies with varied behaviors, yet these efforts are compromised due to suboptimal choices in diversity measures. Such measures often lead to training instability or fail to capture the intended diversity among policies. In this research, we offer a unified perspective that ties together previous work through the common framework of maximizing divergence between steady-state probability distributions induced by different behavioral policies. Most importantly, we introduce an innovative diversity measure, simply used as an intrinsic reward, that addresses the limitations of prior work. Our theoretical advancements are complemented by experimental evidence across a diverse set of benchmarks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: pdf
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4093
Loading