Abstract: Deep Reinforcement Learning reaches a superhuman level of play
in many complete information games. The state of the art algorithm
for learning with zero knowledge is AlphaZero. We take another
approach, Athénan, which uses a different, Minimax-based, search
algorithm called Descent, as well as different learning targets and
that does not use a policy. We show that for multiple games it
is much more efficient than the reimplementation of AlphaZero:
Polygames. It is even competitive with Polygames when Polygames
uses 100 times more GPU (at least for some games). One of the keys
to the superior performance is that the cost of generating state data
for training is approximately 296 times lower with Athénan. With
the same reasonable ressources, Athénan without reinforcement
heuristic is at least 7 times faster than Polygames and much more
than 30 times faster with reinforcement heuristic.
Loading