TL;DR: We uncover a new connection between GFlowNets and reinforcement learning which is previously overlooked by investigating their equivalence through uniform policy evaluation.
Abstract: The Generative Flow Network (GFlowNet) is a probabilistic framework in which an agent learns a stochastic policy and flow functions to sample objects with probability proportional to an unnormalized reward function. GFlowNets share a strong connection with reinforcement learning (RL) that typically aims to maximize reward. A number of recent works explored connections between GFlowNets and maximum entropy (MaxEnt) RL, which incorporates entropy regularization into the standard RL objective. However, the relationship between GFlowNets and standard RL remains largely unexplored, despite the inherent similarities in their sequential decision-making nature.
While GFlowNets can discover diverse solutions through specialized flow-matching objectives, connecting them to standard RL can simplify their implementation through well-established RL principles and also improve RL's capabilities in diverse solution discovery (a critical requirement in many real-world applications), and bridging this gap can further unlock the potential of both fields. In this paper, we bridge this gap by revealing a fundamental connection between GFlowNets and one of the most basic components of RL -- policy evaluation. Surprisingly, we find that the value function obtained from evaluating a uniform policy is closely associated with the flow functions in GFlowNets. Building upon these insights, we introduce a rectified random policy evaluation (RPE) algorithm, which achieves the same reward-matching effect as GFlowNets based on simply evaluating a fixed random policy, offering a new perspective.
Empirical results across extensive benchmarks demonstrate that RPE achieves competitive results compared to previous approaches, shedding light on the previously overlooked connection between (non-MaxEnt) RL and GFlowNets.
Lay Summary: This paper establishes a novel connection between Generative Flow Networks (GFlowNets) and standard (non-MaxEnt) reinforcement learning (RL), specifically through the lens of policy evaluation. While prior work linked GFlowNets to MaxEnt RL, our work reveals a direct relationship with standard RL, which shows that the flow functions in GFlowNets are fundamentally connected to the value function obtained by evaluating a uniform policy. Building on this insight, we introduce Rectified Policy Evaluation (RPE), which achieves the core reward-matching objective of GFlowNets (sampling proportionally to reward) simply by evaluating a fixed, uniform random policy within the standard RL framework. Our work reveals a previously overlooked fundamental link between standard RL (policy evaluation) and GFlowNets, enabling bidirectional benefits: simplifying GFlowNet implementation via RL tools and enhancing RL's capability for diverse solution discovery. Experiments across extensive benchmarks demonstrate that RPE achieves competitive performance compared to previous GFlowNet methods and RL methods.
Primary Area: Probabilistic Methods
Keywords: Generative Flow Network (GFlowNets), Policy Evaluation
Submission Number: 9072
Loading