Abstract: We consider tackling a single-agent RL problem by decomposing it to $n$ learners. These learners are generally trained \textit{egocentrically}: they are greedy with respect to their own local focus. In this extended abstract, we show theoretically and empirically that this leads to the presence of attractors: states attracting and detaining the agent, against what the global objective function would advise.
TL;DR: We show that a local greedy optimisation for a decomposed RL problem creates an attractor phenomenon compromising the task completion.
Keywords: Reinforcement Learning, hierarchical reinforcement learning
3 Replies
Loading