Keywords: credit assignment, structure
TL;DR: We introduce a method for non-temporal credit assignment, by propagating credit to states that are structurally related to each other.
Abstract: In reinforcement learning, traditional value-based methods rely heavily on time as the main proxy for propagating information across the state space. This often results in slow learning and does not scale to large and complex environments. Here, we propose to leverage prior information about the structure of the the environment to assign credit non-temporally to improve learning efficiency. Specifically, we introduce the concept of structural neighbours, which are sets of states with similar semantic structures and which have equivalent values under the optimal policy. We augment traditional value-based RL methods (TD(0), Dyna and Dueling DQN) with a learning mechanism based on structural neighbours. Our empirical results show that by incorporating structural updates, learning efficiency can be greatly improved on a variety of environments ranging from simple tabular grid worlds to those which require function approximation, including the complex and high-dimensional game of Solitaire.
Track: Technical Paper
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
2 Replies
Loading