Effectively Learning Initiation Sets in Hierarchical Reinforcement Learning

Akhil Bagaria; Ben M Abbatematteo; Omer Gottesman; Matt Corsaro; Sreehari Rammohan; George Konidaris

Effectively Learning Initiation Sets in Hierarchical Reinforcement Learning

Akhil Bagaria, Ben M Abbatematteo, Omer Gottesman, Matt Corsaro, Sreehari Rammohan, George Konidaris

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: hierarchical reinforcment learning

TL;DR: Jointly learning option initiation sets and policies in online reinforcement learning

Abstract: An agent learning an option in hierarchical reinforcement learning must solve three problems: identify the option's subgoal (termination condition), learn a policy, and learn where that policy will succeed (initiation set). The termination condition is typically identified first, but the option policy and initiation set must be learned simultaneously, which is challenging because the initiation set depends on the option policy, which changes as the agent learns. Consequently, data obtained from option execution becomes invalid over time, leading to an inaccurate initiation set that subsequently harms downstream task performance. We highlight three issues---data non-stationarity, temporal credit assignment, and pessimism---specific to learning initiation sets, and propose to address them using tools from off-policy value estimation and classification. We show that our method learns higher-quality initiation sets faster than existing methods (in MiniGrid and Montezuma's Revenge), can automatically discover promising grasps for robot manipulation (in Robosuite), and improves the performance of a state-of-the-art option discovery method in a challenging maze navigation task in MuJoCo.

Supplementary Material: pdf

Submission Number: 9166

Loading