Guided Exploration in Deep Reinforcement LearningDownload PDF

27 Sep 2018 (modified: 21 Dec 2018)ICLR 2019 Conference Blind SubmissionReaders: Everyone
  • Abstract: This paper proposes a new method to drastically speed up deep reinforcement learning (deep RL) training for problems that have the property of \textit{state-action permissibility} (SAP). Two types of permissibility are defined under SAP. The first type says that after an action $a_t$ is performed in a state $s_t$ and the agent reaches the new state $s_{t+1}$, the agent can decide whether the action $a_t$ is \textit{permissible} or \textit{not permissible} in state $s_t$. The second type says that even without performing the action $a_t$ in state $s_t$, the agent can already decide whether $a_t$ is permissible or not in $s_t$. An action is not permissible in a state if the action can never lead to an optimal solution and thus should not be tried. We incorporate the proposed SAP property into two state-of-the-art deep RL algorithms to guide their state-action exploration. Results show that the SAP guidance can markedly speed up training.
  • Keywords: deep reinforcement learning, guided exploration, RL training speed up
  • TL;DR: introduces a guided action exploration mechanism that drastically speed up RL training
8 Replies