CIM: Constrained Intrinsic Motivation for Reinforcement Learning

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Reinforcement Learing, Intrinsic Motivation, Skill Discovery
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: This paper investigates two fundamental problems that arise when implementing intrinsic motivation for reinforcement learning: 1) how to design a proper intrinsic objective for Reward-Free Pre-Training (RFPT), and 2) how to reduce the bias introduced by the intrinsic objective for Exploration with Intrinsic Motivation (EIM). Existing intrinsic motivation methods suffer from static skills, limited state coverage, sample inefficiency in RFPT, and suboptimality in EIM. To tackle these problems, we propose \emph{Constrained Intrinsic Motivation (CIM)} for RFPT and EIM, separately. CIM for RFPT maximizes a novel lower bound of the state entropy with an alignment constraint on the skill and state representations for efficient dynamic skill discovery and state coverage maximization. CIM for EIM leverages constrained policy optimization to adaptively adjust the temperature parameter of the intrinsic reward for bias reduction. In multiple MuJoCo robotics environments and tasks, we empirically show that CIM for RFPT achieves greatly improved performance and sample efficiency over state-of-the-art intrinsic motivation methods. Additionally, we showcase the effectiveness of CIM for EIM in redeeming intrinsic rewards when extrinsic rewards are exposed from the beginning.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1838
Loading