Goal2FlowNet: Learning Diverse Policy Covers using GFlowNets for Goal-Conditioned RL

17 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: GFlowNets, Goal-Conditioned Reinforcement Learning, Generalization
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We show the value of learning diverse goal-conditioned policies using GFlowNets
Abstract: Goal-Conditioned Reinforcement Learning is a promising direction for learning policies to reach a diverse set of goals and achieve a flexible and adaptable agent capable of solving multiple tasks. However, many current approaches train policies that explore only a subset of the state space or learn to achieve only a subset of goals in a limited number of ways, leading to suboptimality in these learned policies. This leads to brittleness when the agent is taken into new regions of the state space, or when the distribution changes, rendering the learnt policy ineffective. Additionally, we argue that this also leads to poor sample efficiency and convergence because the knowledge for a specific set of goals has worse generalization to other goals. We propose \emph{Goal2FlowNets}, that use Generative Flow Networks (GFlowNets) in order to learn exploratory goal-conditioned policies that are robust and can generalize better by learning multiple nearly optimal paths to reach the goals. We show that this leads to a significant improvement in sample complexity and enables better zero-shot and few-shot generalization to novel environmental changes through the learning of a stochastic goal-conditioned policy that has a wide coverage of the state and goal space.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 769
Loading