Toward Human Cognition-inspired High-Level Decision Making For Hierarchical Reinforcement Learning Agents

Rousslan Fernand Julien Dossa; Takashi Matsubara

Toward Human Cognition-inspired High-Level Decision Making For Hierarchical Reinforcement Learning Agents

Rousslan Fernand Julien Dossa, Takashi Matsubara

28 May 2022 (modified: 05 May 2023)DARL 2022Readers: Everyone

Keywords: Reinforcement learning, hierarchical reinforcement learning, world models, temporal abstraction, hierarchically organized behavior

TL;DR: Proposes a hierarchical world model (HWM) which yields improved sample efficiency and final performance of model-based RL, while building toward a human-cognition inspired high-level decision-making by integrating the HWM with HRL.

Abstract: The ability of humans to efficiently understand and learn to solve complex tasks with relatively limited data is attributed to our hierarchically organized decision-making process. Meanwhile, sample efficiency is a long-standing challenge for reinforcement learning (RL) agents, especially in long-horizon, sequential decision-making tasks with sparse and delayed rewards. Hierarchical reinforcement learning (HRL) augments RL agents with temporal abstraction to improve their efficiency in such complex tasks. However, the decision-making process of most HRL methods is often based directly on dense low-level information, while also using fixed temporal abstraction. We propose the hierarchical world model (HWM), which is geared toward capturing more flexible high-level, temporally abstract dynamics, as well as low-level dynamics of the task. Preliminary experiments on using the HWM with model-based RL resulted in improved sample efficiency and final performance. An investigation of the state representations learned by the HWM also shows their alignment with human intuition and understanding. Finally, we provide a theoretical foundation for integrating the proposed HWM with the HRL framework, thus building toward RL agents with hierarchically structured decision-making which aligns with the theorized principles of human cognition and decision process.

0 Replies

Loading