Learning Variational Temporal Abstraction Embeddings in Option-Induced MDPs

13 May 2024 (modified: 06 Nov 2024)Submitted to NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, option framework
TL;DR: We proposed a Variational Markoian Option Critic Framework for learning option embeddings on MDPs and significantly outperform other baselines on Gymnasium Mujoco environments.
Abstract: The option framework in hierarchical reinforcement learning has notably advanced the automatic discovery of temporally-extended actions from long-horizon tasks. However, existing methods often struggle with ineffective exploration and unstable updates when learning action and option policies simultaneously. Addressing these challenges, we introduce the Variational Markovian Option Critic (VMOC), an off-policy algorithm with provable convergence that employs variational inference to stabilize updates. VMOC naturally integrates maximum entropy intrinsic rewards to promote the exploration of diverse and effective options. Furthermore, we adopt low-cost option embeddings instead of traditional, computationally expensive option tuples, enhancing scalability and expressiveness. Extensive experiments in challenging Mujoco environments validate VMOC’s superior performance over existing on-policy and off-policy methods, demonstrating its effectiveness in learning coherent and diverse option sets suitable for complex tasks.
Supplementary Material: zip
Primary Area: Reinforcement learning
Submission Number: 6055
Loading