Efficient Value Propagation with the Compositional Optimality Equation

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Reinforcement Learning, Goal-Conditioned Reinforcement Learning, Robotics
TL;DR: We propose an alternative to bellman equation specialized for goal-conditioned environments that lead to significant speed ups in terms of sample efficiency.
Abstract: Goal-Conditioned Reinforcement Learning (GCRL) is about learning to reach predefined goal states. GCRL in the real world is crucial for adaptive robotics. Existing GCRL methods, however, suffer from low sample efficiency and high cost associated with collecting real-world data. Here, we introduce the Compositional Optimality Equation (COE) for a widely used class of deterministic environments in which the reward is obtained only upon reaching a goal state. COE presents a novel alternative to the standard Bellman Optimality Equation, leading to more sample-efficient update rules. The Bellman update uses the immediate reward and the bootstrapped estimate of the best next state. Our COE-based update rule, however, uses the best composition of two bootstrapped estimates in an arbitrary intermediate subgoal state. In tabular settings, the new update rule guarantees convergence to the optimal value function exponentially faster than the Bellman update! COE can also be used to derive compositional variants of conventional (deep) RL. In particular, our COE-based version of DDPG is more sample-efficient than DDPG in the continuous grid world.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8322
Loading