Generative Actor Critic

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Sequential Decision Making, Generative Model, Reinforcment Learninig
TL;DR: Generative Actor Critic decouples decision-making by learning a generative model over trajectories and returns, then uses versatile inference on that model for policy improvement.
Abstract: Conventional Reinforcement Learning (RL) algorithms, typically focused on estimating or maximizing expected returns, face challenges when refining offline pretrained models with online experiences. This paper introduces Generative Actor Critic (GAC), a novel framework that decouples sequential decision-making by reframing policy evaluation as learning a generative model of the joint distribution over trajectories and returns, $p(\tau, y)$, and policy improvement as performing versatile inference on this learned model. To operationalize GAC, we introduce a specific instantiation based on a latent variable model that features continuous latent plan vectors. We develop novel inference strategies for both exploitation, by optimizing latent plans to maximize expected returns, and exploration, by sampling latent plans conditioned on dynamically adjusted target returns. Experiments on Gym-MuJoCo and Maze2D benchmarks demonstrate GAC's strong offline performance and significantly enhanced offline-to-online improvement compared to state-of-the-art methods, even in absence of step-wise rewards.
Primary Area: reinforcement learning
Submission Number: 6758
Loading