Learning to Play Against Any Mixture of Opponents
Abstract: Intuitively, experience playing against one mixture
of opponents in a given domain should be relevant
for a different mixture in the same domain.
We propose a transfer learning method, Q-Mixing,
that starts by learning Q-values against each purestrategy
opponent. Then a Q-value for any distribution
of opponent strategies is approximated
by appropriately averaging the separately learned
Q-values. From these components, we construct
policies against all opponent mixtures without
any further training. We empirically validate QMixing
in two environments: a simple grid-world
soccer environment, and a social dilemma game.
We find that Q-Mixing is able to successfully
transfer knowledge across any mixture of opponents.
We next consider the use of observations
during play to update the believed distribution of
opponents. We introduce an opponent classifier—
trained in parallel to Q-learning, reusing data—
and use the classifier results to refine the mixing
of Q-values. We find that Q-Mixing augmented
with the opponent policy classifier performs better,
with higher variance, than training directly
against a mixed-strategy opponent.
0 Replies
Loading