Deep Learning of Intrinsically Motivated Options in the Arcade Learning Environment

Louis Bagot; Kevin Mets; Tom De Schepper; Peter Hellinckx; Steven Latre

Deep Learning of Intrinsically Motivated Options in the Arcade Learning Environment

Louis Bagot, Kevin Mets, Tom De Schepper, Peter Hellinckx, Steven Latre

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: reinforcement learning, intrinsic motivation, auxiliary task learning, options, atari

Abstract: Although Intrinsic Motivation allows a Reinforcement Learning agent to generate directed behaviors in an environment, even with sparse or noisy rewards, combining intrinsic and extrinsic rewards is non trivial. As an alternative to the widespread method of a weighted sum of rewards, Explore Options let the agent call an intrinsically motivated agent in order to observe and learn from interesting behaviors in the environment. Such options have only been established for simple tabular cases, and are unfit to high dimensional spaces. In this paper, we propose Deep Explore Options, revising Explore Options within the Deep Reinforcement Learning paradigm to tackle complex visual problems. Deep Explore Options can naturally learn from several unrelated intrinsic rewards, ignore harmful intrinsic rewards, learn to balance exploration, but also isolate exploitative or exploratory behaviors. In order to achieve this, we first introduce J-PER, a new transition-selection algorithm based on the interest of multiple agents. Next, we propose to consider intrinsic reward learning as an auxiliary task, with a resulting architecture achieving $50\%$ faster wall-clock speed and building a stronger, shared representation. We test Deep Explore Options on hard and easy exploration games of the Atari Suite, following a benchmarking study to ensure fairness. Our results show that not only can they learn from multiple intrinsic rewards, they are a very strong alternative to a weighted sum of rewards, convincingly beating the baselines in 4 of the 6 tested environments, and with comparable performances in the other 2.

One-sentence Summary: We introduce Deep Explore Options to decouple Intrinsic and Extrinsic reward learning in Deep Reinforcement Learning, showcasing learning from multiple intrinsic rewards while achieving excellent performance in Atari.

11 Replies

Loading