Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies

Kenneth Marino; Abhinav Gupta; Rob Fergus; Arthur Szlam

Hierarchical RL Using an Ensemble of Proprioceptive Periodic Policies

Kenneth Marino, Abhinav Gupta, Rob Fergus, Arthur Szlam

Published: 21 Dec 2018, Last Modified: 05 May 2023ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: In this paper we introduce a simple, robust approach to hierarchically training an agent in the setting of sparse reward tasks. The agent is split into a low-level and a high-level policy. The low-level policy only accesses internal, proprioceptive dimensions of the state observation. The low-level policies are trained with a simple reward that encourages changing the values of the non-proprioceptive dimensions. Furthermore, it is induced to be periodic with the use a ``phase function.'' The high-level policy is trained using a sparse, task-dependent reward, and operates by choosing which of the low-level policies to run at any given time. Using this approach, we solve difficult maze and navigation tasks with sparse rewards using the Mujoco Ant and Humanoid agents and show improvement over recent hierarchical methods.

Data: [MuJoCo](https://paperswithcode.com/dataset/mujoco), [OpenAI Gym](https://paperswithcode.com/dataset/openai-gym)

11 Replies

Loading