Learning to Play Against Any Mixture of Opponents

Max Smith, Thomas Anthony, Michael P. Wellman

10 Aug 2021OpenReview Archive Direct UploadReaders: Everyone

Abstract: Intuitively, experience playing against one mixture of opponents in a given domain should be relevant for a different mixture in the same domain. We propose a transfer learning method, Q-Mixing, that starts by learning Q-values against each purestrategy opponent. Then a Q-value for any distribution of opponent strategies is approximated by appropriately averaging the separately learned Q-values. From these components, we construct policies against all opponent mixtures without any further training. We empirically validate QMixing in two environments: a simple grid-world soccer environment, and a social dilemma game. We find that Q-Mixing is able to successfully transfer knowledge across any mixture of opponents. We next consider the use of observations during play to update the believed distribution of opponents. We introduce an opponent classifier— trained in parallel to Q-learning, reusing data— and use the classifier results to refine the mixing of Q-values. We find that Q-Mixing augmented with the opponent policy classifier performs better, with higher variance, than training directly against a mixed-strategy opponent.

0 Replies