Abstract: AlphaGo trains policy networks with both supervised and reinforcement learning and makes different policy networks play millions of games so as to train a value network. The reinforcement learning part requires massive ammount of computation. We propose to train networks for computer Go so that given accuracy is reached with much less examples. We modify the architecture of the networks in order to train them faster and to have better accuracy in the end.
TL;DR: Improving training of deep networks for computer Go modifying the layers
Conflicts: www.lamsade.dauphine.fr
Keywords: Games, Supervised Learning, Deep learning
9 Replies
Loading