Leveraging Causal Policy-Reward Entropy for Enhanced Exploration

Published: 19 Mar 2024, Last Modified: 02 May 2024ICLR 2024 TinyPapers Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Causal Recognition, Actor-Critic
TL;DR: We exploit the shifting significance of action dimensions, identified via causal recognition, and propose a causal policy-reward entropy term. The developed CAC algorithm outperforms in many benchmark tasks..
Abstract: The impacts of taking different actions in reinforcement learning (RL) tasks often dynamically vary during the policy learning process. We exploit the causal relationship between actions and potential reward gains, proposing a causal policy-reward entropy term. This term could effectively identify and prioritize actions with high potential impacts, thus enhancing exploration efficiency. Moreover, it could be seamlessly incorporated into any Max-Entropy RL framework. Our instantiation, termed Causal Actor-Critic (CAC), showcases superior performance across a range of continuous control tasks and provides insightful explanations for the actions.
Supplementary Material: zip
Submission Number: 63
Loading