Better state exploration using action sequence equivalence

Nathan Grinsztajn; Toby Johnstone; Johan Ferret; Philippe Preux

Better state exploration using action sequence equivalence

Nathan Grinsztajn, Toby Johnstone, Johan Ferret, Philippe Preux

08 Oct 2022 (modified: 05 May 2023)Deep RL Workshop 2022Readers: Everyone

Keywords: Reinforcement learning, priors, structure, exploration

TL;DR: We propose an exploration strategy to maximize new state visitations when we have the prior that different sequences of actions produce the same effect.

Abstract: Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a \emph{tabula rasa} setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect. We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem. By replacing the usual $\epsilon$-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures.

Supplementary Material: zip

0 Replies

Loading