Keywords: reinforcement learning, option framework
TL;DR: We proposed a Variational Markoian Option Critic Framework for learning option embeddings on MDPs and significantly outperform other baselines on Gymnasium Mujoco environments.
Abstract: The option framework in hierarchical reinforcement learning has notably advanced the automatic discovery of temporally-extended actions from long-horizon tasks. However, existing methods often struggle with ineffective exploration and unstable updates when learning action and option policies simultaneously. Addressing these challenges, we introduce the Variational Markovian Option Critic (VMOC), an off-policy algorithm with provable convergence that employs variational inference to stabilize updates. VMOC naturally integrates maximum entropy intrinsic rewards to promote the exploration of diverse and effective options. Furthermore, we adopt low-cost option embeddings instead of traditional, computationally expensive option tuples, enhancing scalability and expressiveness. Extensive experiments in challenging Mujoco environments validate VMOC’s superior performance over existing on-policy and off-policy methods, demonstrating its effectiveness in learning coherent and diverse option sets suitable for complex tasks.
Supplementary Material: zip
Primary Area: Reinforcement learning
Submission Number: 6055
Loading