Guided Actor-Critic: Off-Policy Partially Observable Reinforcement Learning with Privileged Information
Keywords: reinforcement learning, POMDPs, off-policy, teacher-student learning
Abstract: Real-world decision-making systems often operate under partial observability due to limited sensing or noisy information, which poses significant challenges for reinforcement learning (RL).
A common strategy to mitigate this issue is to leverage privileged information—available only during training—to guide the learning process. While existing approaches such as policy distillation and asymmetric actor-critic methods make use of such information, they frequently suffer from weak supervision or suboptimal knowledge transfer.
In this work, we propose Guided Actor-Critic (GAC), a novel off-policy RL algorithm that unifies privileged policy and value learning under a guided policy iteration framework.
GAC jointly trains a fully observable policy and a partially observable policy using constrained RL and supervised learning objectives, respectively.
We theoretically establish convergence in the tabular case and empirically validate GAC on challenging benchmarks, including Brax, POPGym, and HumanoidBench, where it achieves superior sample efficiency and final performance.
Primary Area: reinforcement learning
Submission Number: 15223
Loading