Keywords: GFlowNets, reinforcement learning, combinatorial multi-armed bandit
TL;DR: Combining CMAB framework with GFNs to accelerates the convergence and improve the generation quality of GFNs.
Abstract: As a probabilistic sampling framework, Generative Flow Networks (GFNs) show strong potential for constructing complex combinatorial objects through the sequential composition of elementary components. However, existing {\GFNs} often suffer from excessive exploration over vast state spaces, leading to over-sampling of low-reward regions and convergence to suboptimal distributions. Effectively biasing {\GFNs} toward high-reward solutions remains a non-trivial challenge. In this paper, we propose {\modelname}, which integrates a combinatorial multi-armed bandit (CMAB) framework with GFN policies. The CMAB component prunes low-quality actions, yielding compact subspaces for exploration. Restricting GFNs to these compact subspaces accelerates the discovery of high-value candidates, while the reduced complexity enables faster convergence. Experimental results on multiple tasks demonstrate that {\modelname} generates higher-reward candidates than existing approaches, without sacrificing diversity. All implementations are publicly available at \url{https://anonymous.4open.science/r/CBFlowNet-E0BA/}.
Primary Area: reinforcement learning
Submission Number: 12280
Loading