Flow Q-Learning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: We present flow Q-learning (FQL), a simple and performant offline reinforcement learning (RL) method that leverages an expressive flow-matching policy to model arbitrarily complex action distributions in data. Training a flow policy with RL is a tricky problem, due to the iterative nature of the action generation process. We address this challenge by training an expressive one-step policy with RL, rather than directly guiding an iterative flow policy to maximize values. This way, we can completely avoid unstable recursive backpropagation, eliminate costly iterative action generation at test time, yet still mostly maintain expressivity. We experimentally show that FQL leads to strong performance across 73 challenging state- and pixel-based OGBench and D4RL tasks in offline RL and offline-to-online RL.
Lay Summary: Can we leverage the power of recent generative models for reinforcement learning (RL) and robotic control? We propose a simple yet effective method that trains an RL agent using flow matching, a state-of-the-art generative model widely used in image and video generation. Using an iterative generative model for robotic control is not a trivial problem. The main challenge is that naively combining an iterative generative model with RL gives rise to an issue called "backpropagation through time," which makes training unstable and costly. We resolve this issue using distillation, a technique that "condenses" a complex iterative procedure into a simple, one-step generation process. Our solution not only makes training stable and cost-efficient, but also leads to state-of-the-art performance on many challenging simulated robotic tasks across robotic navigation and manipulation. We expect that this technique can be applied to solving a wide range of real, challenging robotic tasks.
Link To Code: https://seohong.me/projects/fql/
Primary Area: Reinforcement Learning->Batch/Offline
Keywords: reinforcement learning
Submission Number: 3259
Loading