Keywords: Deep Reinforcement Learning; Maximum Entropy Reinforcement Learning
TL;DR: We aim to maximize the next-state entropy by constructing an action-mapping layer and maximizing the policy entropy of the inner policy.
Abstract: Maximum entropy algorithms have demonstrated significant progress in Reinforcement Learning~(RL), which offers an additional guidance in the form of entropy, particularly beneficial in tasks with sparse rewards. Nevertheless, current approaches grounded in policy entropy encourage the agent to explore diverse actions, yet they do not directly help agent explore diverse states. In this study, we theoretically reveal the challenge for optimizing the next-state entropy of agent. To address this limitation, we introduce Maximum Next-State Entropy (MNSE), a novel method which maximizes next-state entropy through an action mapping layer following the inner policy. We provide a theoretical analysis demonstrating that MNSE can maximize next-state entropy by optimizing the action entropy of the inner policy. We conduct extensive experiments on various continuous control tasks and show that MNSE can significantly improve the exploration capability of RL algorithms.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 5644
Loading