QGFN: Controllable Greediness with Action Values

Published: 17 Jun 2024, Last Modified: 18 Jul 20242nd SPIGM @ ICML PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: GFlowNets, generative models, reinforcement learning, molecule design
TL;DR: We combine a GFlowNet policy and an action-value estimate, $Q$ into mixture policies that get better rewards without losing diversity
Abstract: Generative Flow Networks (GFlowNets; GFNs) are a family of energy-based generative methods for combinatorial objects, capable of generating diverse and high-utility samples. However, consistently biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate, $Q$, to create greedier sampling policies which can be controlled by a mixing parameter. We show that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity.
Submission Number: 48
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview