- Abstract: Current deep neuroevolution models are usually trained in a large parameter search space for complex learning tasks, e.g. playing video games, which needs billions of samples and thousands of search steps to obtain significant performance. This raises a question of whether we can make use of sequential data generated during evolution, encode input samples, and evolve in low dimensional parameter space with latent state input in a fast and efficient manner. Here we give an affirmative answer: we train a VAE to encode input samples, then an RNN to model environment dynamics and handle temporal information, and last evolve our low dimensional policy network in latent space. We demonstrate that this approach is surprisingly efficient: our experiments on Atari games show that within 10M frames and 30 evolution steps of training, our algorithm could achieve competitive result compared with ES, A3C, and DQN which need billions of frames.
- Keywords: Neuroevolution, Reinforcement Learning