CBFlownet: Generating Higher-Quality Candidates via Combinatorial Bandits

Xuan Yu; Xu Wang; Yudong Zhang; Rui Zhu; Pengkun Wang; Yang Wang

CBFlownet: Generating Higher-Quality Candidates via Combinatorial Bandits

Xuan Yu, Xu Wang, Yudong Zhang, Rui Zhu, Pengkun Wang, Yang Wang

18 Sept 2025 (modified: 23 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: GFlowNets, reinforcement learning, combinatorial multi-armed bandit

TL;DR: Combining CMAB framework with GFNs to accelerates the convergence and improve the generation quality of GFNs.

Abstract: As a probabilistic sampling framework, Generative Flow Networks (GFNs) show strong potential for constructing complex combinatorial objects through the sequential composition of elementary components. However, existing {\GFNs} often suffer from excessive exploration over vast state spaces, leading to over-sampling of low-reward regions and convergence to suboptimal distributions. Effectively biasing {\GFNs} toward high-reward solutions remains a non-trivial challenge. In this paper, we propose {\modelname}, which integrates a combinatorial multi-armed bandit (CMAB) framework with GFN policies. The CMAB component prunes low-quality actions, yielding compact subspaces for exploration. Restricting GFNs to these compact subspaces accelerates the discovery of high-value candidates, while the reduced complexity enables faster convergence. Experimental results on multiple tasks demonstrate that {\modelname} generates higher-reward candidates than existing approaches, without sacrificing diversity. All implementations are publicly available at \url{https://anonymous.4open.science/r/CBFlowNet-E0BA/}.

Primary Area: reinforcement learning

Submission Number: 12280

Loading