Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Faster Reinforcement Learning with Expert State Sequences
Xiaoxiao Guo, Shiyu Chang, Mo Yu, Miao Liu, Gerald Tesauro
Feb 15, 2018 (modified: Feb 15, 2018)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:Imitation learning relies on expert demonstrations. Existing approaches often re- quire that the complete demonstration data, including sequences of actions and states are available. In this paper, we consider a realistic and more difficult sce- nario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are not available. Inferring the unseen ex- pert actions in a stochastic environment is challenging and usually infeasible when combined with a large state space. We propose a novel policy learning method which only utilizes the expert state sequences without inferring the unseen ac- tions. Specifically, our agent first learns to extract useful sub-goal information from the state sequences of the expert and then utilizes the extracted sub-goal information to factorize the action value estimate over state-action pairs and sub- goals. The extracted sub-goals are also used to synthesize guidance rewards in the policy learning. We evaluate our agent on five Doom tasks. Our empirical results show that the proposed method significantly outperforms the conventional DQN method.