Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training

Ruishuo Chen; Xun Wang; Rui Hu; Zhuoran Li; Longbo Huang

Beyond the Proxy: Trajectory-Distilled Guidance for Offline GFlowNet Training

Ruishuo Chen, Xun Wang, Rui Hu, Zhuoran Li, Longbo Huang

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Generative Flow Networks (GFlowNets), proxy-free guidance

TL;DR: We propose a proxy-free framework for offline GFlowNet training that learns dense guidance from trajectories via IRL, steering exploration through DAG pruning and prioritized sampling for remarkable efficiency and performance.

Abstract: Generative Flow Networks (GFlowNets) are effective at sampling diverse, high-reward objects, but in many real-world settings where new reward queries are infeasible, they must be trained from offline datasets. The prevailing training methods rely on a proxy model to provide reward feedback for online sampled trajectories. However, in scenarios where constructing a reliable proxy is challenging due to data scarcity or cost, one must turn to static offline trajectories for training. Nevertheless, current proxy-free approaches often rely on coarse constraints that may limit the model's ability to explore. To overcome these challenges, we propose **Trajectory-Distilled GFlowNet (TD-GFN)**, a novel proxy-free training framework. TD-GFN learns dense, transition-level edge rewards from offline trajectories via inverse reinforcement learning to provide rich structural guidance for efficient exploration. Crucially, to ensure robustness, these rewards are used indirectly to guide the policy through DAG pruning and prioritized backward sampling of training trajectories. This ensures that final gradient updates depend only on ground-truth terminal rewards from the dataset, thereby preventing the error propagation. Experiments show that TD-GFN significantly outperforms a broad range of existing baselines in both convergence speed and final sample quality, establishing a more robust and efficient paradigm for offline GFlowNet training.

Primary Area: reinforcement learning

Submission Number: 7075

Loading