Guided Exploration in Deep Reinforcement Learning

Sahisnu Mazumder; Bing Liu; Shuai Wang; Yingxuan Zhu; Xiaotian Yin; Lifeng Liu; Jian Li; Yongbing Huang

Guided Exploration in Deep Reinforcement Learning

Sahisnu Mazumder, Bing Liu, Shuai Wang, Yingxuan Zhu, Xiaotian Yin, Lifeng Liu, Jian Li, Yongbing Huang

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: This paper proposes a new method to drastically speed up deep reinforcement learning (deep RL) training for problems that have the property of \textit{state-action permissibility} (SAP). Two types of permissibility are defined under SAP. The first type says that after an action $a_t$ is performed in a state $s_t$ and the agent reaches the new state $s_{t+1}$, the agent can decide whether the action $a_t$ is \textit{permissible} or \textit{not permissible} in state $s_t$. The second type says that even without performing the action $a_t$ in state $s_t$, the agent can already decide whether $a_t$ is permissible or not in $s_t$. An action is not permissible in a state if the action can never lead to an optimal solution and thus should not be tried. We incorporate the proposed SAP property into two state-of-the-art deep RL algorithms to guide their state-action exploration. Results show that the SAP guidance can markedly speed up training.

Keywords: deep reinforcement learning, guided exploration, RL training speed up

TL;DR: introduces a guided action exploration mechanism that drastically speed up RL training

8 Replies

Loading