Abstract: Reinforcement Learning (RL) has emerged as a core algorithmic paradigm explicitly driving innovation in a growing number of industrial applications, including large language models and quantitative finance. Furthermore, computational neuroscience has long found evidence of natural forms of RL in biological brains. Therefore, it is crucial for the study of social dynamics to develop a scientific understanding of how RL shapes population behaviors. We leverage the framework of Evolutionary Game Theory (EGT) to provide building blocks and insights toward this objective. We propose a methodology that enables simulating large populations of RL agents in simple game theoretic interaction models. More specifically, we derive fast and parallelizable implementations of two fundamental revision protocols from multi-agent RL - Policy Gradient (PG) and Opponent-Learning Awareness (LOLA) - tailored for population simulations of random pairwise interactions in stateless normal-form games. Our methodology enables us to simulate large populations of 200,000 independent co-learning agents, yielding compelling insights into how non-stationarity-aware learners affect social dynamics.
In particular, we find that LOLA learners promote cooperation in the Stag Hunt model, delay cooperative outcomes in the Hawk-Dove model, and reduce strategy diversity in the Rock-Paper-Scissors model.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - Implemented most suggestions from the reviewers
- Replaced all mentions of "heterogeneity" by "independence" to avoid reader confusion with things like heterogeneity of agent capabilities (note: heterogeneity in our paper is limited to (1) independence of policies and learning rules, and (2) heterogeneity in final populations, especially in naive RPS)
- Fixed remaining typos
- Formatting compliant with TMLR's guidelines
- Layout optimizations
Code: https://github.com/MISTLab/RL-societies
Supplementary Material: zip
Assigned Action Editor: ~Marlos_C._Machado1
Submission Number: 5859
Loading