- Abstract: Imitation learning relies on expert demonstrations. Existing approaches often re- quire that the complete demonstration data, including sequences of actions and states are available. In this paper, we consider a realistic and more difficult sce- nario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are not available. Inferring the unseen ex- pert actions in a stochastic environment is challenging and usually infeasible when combined with a large state space. We propose a novel policy learning method which only utilizes the expert state sequences without inferring the unseen ac- tions. Specifically, our agent first learns to extract useful sub-goal information from the state sequences of the expert and then utilizes the extracted sub-goal information to factorize the action value estimate over state-action pairs and sub- goals. The extracted sub-goals are also used to synthesize guidance rewards in the policy learning. We evaluate our agent on five Doom tasks. Our empirical results show that the proposed method significantly outperforms the conventional DQN method.
- Keywords: Reinforcement Learning, Imitation Learning