TL;DR: Apply Monte Carlo Tree Search to episode generation in Alpha Zero
Abstract: Reinforcement learning methods that continuously learn neural networks by episode generation with game tree search have been successful in two-person complete information deterministic games such as chess, shogi, and Go. However, there are only reports of practical cases and there are little evidence to guarantee the stability and the final performance of learning process. In this research, the coordination of episode generation was focused on. By means of regarding the entire system as game tree search, the new method can handle the trade-off between exploitation and exploration during episode generation. The experiments with a small problem showed that it had robust performance compared to the existing method, Alpha Zero.
Keywords: Reinforcement Learning, Monte Carlo Tree Search, Alpha Zero
Original Pdf: pdf
7 Replies
Loading