Keywords: Generative Flow Networks, Evolutionary Algorithms, Actor-Critic Methods, Reinforcement Learning, Sequential Modelling
TL;DR: DATE-GFN solves the sparse-reward credit assignment problem in GFlowNets by evolving a population of "teachable" critics that provide a dense, low-variance learning signal for a final distilled policy.
Abstract: Generative Flow Networks (GFlowNets) are powerful for scientific discovery but are severely hampered in sparse-reward, long-horizon settings by the temporal credit assignment problem, which causes high-variance gradients. While recent work has sought to densify learning signals \citep{Jang2023, Pan2023} or improve exploration with methods like Evolution Guided GFlowNets (EGFN) \citep{ikram2024egfn}, the fundamental variance issue for the learning agent persists. We introduce the Distillation-Aware Twisted Evolutionary GFlowNet (DATE-GFN), an actor-critic inspired framework that recasts the problem. We advocate for a paradigm shift: instead of evolving policies, DATE-GFN evolves a population of critics (state-dependent value functions, or \emph{twist functions}) that learn to estimate the expected future reward from any state. This constructs a dense, state-dependent guidance signal, transforming the high-variance, reward-driven learning into a stable, low-variance supervised distillation task where the student GFlowNet learns to imitate the policy induced by the best critic. Crucially, we solve the inherent \emph{realization gap} between an optimal teacher and a finite-capacity student via a novel \textbf{distillation-aware fitness function}. This objective creates a principled trade-off: it simultaneously rewards critics for discovering high-reward states while penalizing them for their \emph{teachability}, measured by the KL-divergence between their induced policy and the student's. This creates a symbiotic co-evolutionary dynamic where the evolutionary search for better critics is continuously grounded in the student's current learning capabilities. We prove this system converges to a realizable, high-performing equilibrium and show empirically that DATE-GFN significantly outperforms state-of-the-art baselines.
Supplementary Material: zip
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 8874
Loading