Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning
Abstract: Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We propose a benchmark for multiagent learning based on repeated play of the simple game Rock, Paper, Scissors along with a population of forty-three tournament entries, some of which are intentionally sub-optimal. We describe metrics to measure the quality of agents based both on average returns and exploitability. We then show that several RL, online learning, and language model approaches can learn good counter-strategies and generalize well, but ultimately lose to the top-performing bots, creating an opportunity for research in multiagent learning.
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: - Added authors - Added a new Table 4 with results of independent RL agents with long recalls (R = 10, R = 100), along with a new paragraph in Section 4.1 discussing those results. - Fixed remaining typos and missed items as pointed out by AE - Changed fotrnote to revealed link that points to full data bot evaluation data OpenSpiel (Figure 2) - Changed footnote and text to reveal the identity of the LLM agent (Chinchilla) and its specific number of parameters
Supplementary Material: pdf
Assigned Action Editor: ~Marcello_Restelli1
Submission Number: 1280