Hierarchical Reinforcement Learning with Uncertainty-Guided Diffusional Subgoals

Vivienne Huiling Wang; Tinghuai Wang; Joni Pajarinen

Hierarchical Reinforcement Learning with Uncertainty-Guided Diffusional Subgoals

Vivienne Huiling Wang, Tinghuai Wang, Joni Pajarinen

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: GP-guided diffusion models generate more stable and efficient subgoals in off-policy HRL by combining the diffusion model's flexibility with the GP's uncertainty-aware feasibility, improving performance on complex tasks.

Abstract: Hierarchical reinforcement learning (HRL) learns to make decisions on multiple levels of temporal abstraction. A key challenge in HRL is that the low-level policy changes over time, making it difficult for the high-level policy to generate effective subgoals. To address this issue, the high-level policy must capture a complex subgoal distribution while also accounting for uncertainty in its estimates. We propose an approach that trains a conditional diffusion model regularized by a Gaussian Process (GP) prior to generate a complex variety of subgoals while leveraging principled GP uncertainty quantification. Building on this framework, we develop a strategy that selects subgoals from both the diffusion policy and GP's predictive mean. Our approach outperforms prior HRL methods in both sample efficiency and performance on challenging continuous control benchmarks.

Lay Summary: Think of teaching a robot a complex job, like tidying a room. Instead of one giant instruction, a "manager" robot gives a "worker" robot a series of smaller, achievable steps, like "pick up the toy" then "put it in the box." This is called Hierarchical Reinforcement Learning. A big problem is that the worker robot is always learning and improving, so the manager struggles to give good, up-to-date steps. Our new method helps the manager robot choose better steps. We use a smart system (a "conditional diffusion model") that can imagine many possible good next steps. To make sure these imagined steps are sensible and achievable, we use another system (a "Gaussian Process") that learns from past successes and warns about risky steps. By combining these two, the manager robot gives better instructions, helping the worker robot learn faster and do its job more effectively.

Primary Area: Reinforcement Learning->Deep RL

Keywords: Hierarchical Reinforcement Learning

Submission Number: 12094

Loading