Smart exploration in reinforcement learning using absolute temporal difference errors

Clement Gehring, Doina Precup

2013 (modified: 31 Mar 2022)AAMAS 2013Readers: Everyone

Abstract: Exploration is still one of the crucial problems in reinforcement learning, especially for agents acting in safety-critical situations. We propose a new directed exploration method, based on a notion of state controlability. Intuitively, if an agent wants to stay safe, it should seek out states where the effects of its actions are easier to predict; we call such states more controllable. Our main contribution is a new notion of controlability, computed directly from temporal-difference errors. Unlike other existing approaches of this type, our method scales linearly with the number of state features, and is directly applicable to function approximation. Our method converges to correct values in the policy evaluation setting. We also demonstrate significantly faster learning when this exploration strategy is used in large control problems.

0 Replies