Learning Intrinsically Motivated Options to Stimulate Policy Exploration

Louis Bagot; Kevin Mets; Steven Latré

Learning Intrinsically Motivated Options to Stimulate Policy Exploration

Louis Bagot, Kevin Mets, Steven Latré

12 Jun 2020 (modified: 05 May 2023)LifelongML@ICML2020Readers: Everyone

Student First Author: Yes

Keywords: reinforcement learning, intrinsic motivation, curiosity, hierarchical reinforcement learning, options, exploration option

Abstract: A Reinforcement Learning (RL) agent needs to find an optimal sequence of actions in order to maximize rewards. This requires consistent exploration of states and action sequences to ensure the policy found is optimal. One way to motivate exploration is through intrinsic rewards, i.e. agent-induced rewards to guide itself towards interesting behaviors. We propose to learn from such intrinsic rewards through exploration options, i.e. additional temporally-extended actions to call separate policies (or "Explorer" agents) associated to an intrinsic reward. We show that this method has several key advantages over the usual method of weighted sum of rewards, mainly task-transfer abilities and scalability to multiple reward functions.

TL;DR: We decouple an RL agent between the Exploiter, learning the true task reward, and the Explorers, learning from intrinsic motivation.

0 Replies

Loading