Flow Actor-Critic for Offline Reinforcement Learning

ICLR 2026 Conference Submission17903 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, offline reinforcement learning, flow actor-critic, flow policies, flow matching
TL;DR: We propose Flow Actor Critic (FAC), which leverages a flow behavior proxy policy not only to penalize the critic via tractable behavior densities but also to regularize the actor, effectively handling value overestimation for OOD actions.
Abstract: The dataset distributions in offline reinforcement learning (RL) often exhibit complex and multi-modal distributions, necessitating expressive policies to capture such distributions beyond widely-used Gaussian policies. To handle such complex and multi-modal datasets, in this paper, we propose Flow Actor-Critic, a new actor-critic method for offline RL, based on recent flow policies. The proposed method not only uses the flow model for actor as in previous flow policies but also exploits the expressive flow model for conservative critic acquisition to prevent Q-value explosion in out-of-data regions. To this end, we propose a new form of critic regularizer based on the accurate proxy behavior model obtained as a byproduct of flow-based actor design. Leveraging the flow model in this joint way, we achieve new state-of-the-art performance for test datasets of offline RL including the D4RL and recent OGBench benchmarks.
Primary Area: reinforcement learning
Submission Number: 17903
Loading