Hierarchical Reinforcement Learning with Unlimited Recursive Subroutine Calls

Yuuji Ichisugi, Naoto Takahashi, Hidemoto Nakada, Takashi Sano

Published: 2019, Last Modified: 14 May 2025ICANN (2) 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Humans can set suitable subgoals to achieve certain tasks. They can also set sub-subgoals recursively if required. The depth of this recursion is apparently unlimited. Inspired by this behavior, we propose a new hierarchical reinforcement learning architecture called RGoal. RGoal solves the Markov Decision Process (MDP) in an augmented state-action space. In multitask settings, sharing subroutines between tasks makes learning faster. A novel mechanism called thought-mode is a type of model-based reinforcement learning. It combines learned simple tasks to solve unknown complicated tasks rapidly, sometimes in zero-shot time.