GFLOWNET TRAINING BY POLICY GRADIENTS

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Generative model, Variational Inference, Reinforcement Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Generative Flow Networks (GFlowNets) have been shown with an attractive capability to generate combinatorial objects with desired properties. In this paper, we propose a policy-dependent reward that bridges the flow balance in GFlowNet training to optimizing the expected accumulated reward in traditional Reinforcement-Learning (RL). This allows us to derive policy-based GFlowNet training strategies. It is known that the training efficiency is affected by the design of backward policies in GFlowNets. We propose a coupled training strategy that can jointly solve the GFlowNet training and backward policy design. Performance analysis is provided with a theoretical guarantee of our proposed methods. We further conduct experiments on both simulated and real-world datasets to verify that our policy-based strategy outperforms the existing GFlowNet training strategies.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6741
Loading