Efficient Value Propagation with the Compositional Optimality Equation

Published: 03 Nov 2023, Last Modified: 27 Nov 2023GCRL WorkshopEveryoneRevisionsBibTeX
Confirmation: I have read and confirm that at least one author will be attending the workshop in person if the submission is accepted
Keywords: Goal-conditioned Reinforcement Learning, Reinforcement Learning, Sample efficiency
TL;DR: We propose an alternative to bellman equation specialized for goal-conditioned environments that lead to significant speed ups in terms of sample efficiency.
Abstract: Goal-Conditioned Reinforcement Learning (GCRL) is about learning to reach predefined goal states. GCRL in the real world is crucial for adaptive robotics. Existing GCRL methods, however, suffer from low sample efficiency and high cost of collecting real-world data. Here we introduce the Compositional Optimality Equation (COE) for a widely used class of deterministic environments in which the reward is obtained only upon reaching a goal state. COE represents a novel alternative to the standard Bellman Optimality Equation, leading to more sample-efficient update rules. The Bellman update combines the immediate reward and the bootstrapped estimate of the best next state. Our COE-based update rule, however, combines the best composition of two bootstrapped estimates reflecting an arbitrary intermediate subgoal state. In tabular settings, the new update rule guarantees convergence to the optimal value function exponentially faster than the Bellman update! COE can also be used to derive compositional variants of conventional (deep) RL. In particular, our COE-based version of DDPG is more sample-efficient than DDPG in a continuous grid world.
Submission Number: 14
Loading