Keywords: Generative Flow Networks (GFlowNets), proxy-free guidance
TL;DR: We propose a proxy-free framework for offline GFlowNet training that learns dense guidance from trajectories via IRL, steering exploration through DAG pruning and prioritized sampling for remarkable efficiency and performance.
Abstract: Generative Flow Networks (GFlowNets) are effective at sampling diverse, high-reward objects, but in many real-world settings where new reward queries are infeasible, they must be trained from offline datasets. The prevailing training methods rely on a proxy model to provide reward feedback for online sampled trajectories. However, in scenarios where constructing a reliable proxy is challenging due to data scarcity or cost, one must turn to static offline trajectories for training. Nevertheless, current proxy-free approaches often rely on coarse constraints that may limit the model's ability to explore. To overcome these challenges, we propose **Trajectory-Distilled GFlowNet (TD-GFN)**, a novel proxy-free training framework. TD-GFN learns dense, transition-level edge rewards from offline trajectories via inverse reinforcement learning to provide rich structural guidance for efficient exploration. Crucially, to ensure robustness, these rewards are used indirectly to guide the policy through DAG pruning and prioritized backward sampling of training trajectories. This ensures that final gradient updates depend only on ground-truth terminal rewards from the dataset, thereby preventing the error propagation. Experiments show that TD-GFN significantly outperforms a broad range of existing baselines in both convergence speed and final sample quality, establishing a more robust and efficient paradigm for offline GFlowNet training.
Primary Area: reinforcement learning
Submission Number: 7075
Loading