Abstract: This paper presents a novel approach to reinforcement learning (RL) utilizing Bayesian flow networks for sequence generation, enabling effective planning in both discrete and continuous domains by conditioning on returns and current states. We explore two conditioning strategies: state inpainting and a classifier-free method. Experimental results demonstrate the robustness of our method across various environments. It adeptly navigated gridworld environments in discrete settings, without sacrificing performance in continuous tasks compared to current state of the art . The results highlight our approach's ability to effectively capture spatial and temporal dependencies through a specialized neural network architecture combining 2D convolutions with a temporal u-net.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Andrew_Kyle_Lampinen1
Submission Number: 3003
Loading