Reproducibility Study: Mastering cooperation between small LLMs within the Governance of the Commons Simulation
Abstract: Governance of the Commons Simulation (GovSim) is a Large Language Model (LLM) multi- agent framework designed to study cooperation and sustainability between LLM agents in resource-sharing environments (Piatti et al., 2024). Understanding the cooperation capabilities of LLMs is vital to the real-world applicability of these models. This reproducibility study aims to verify the claims in the original paper by replicating their experiments using small open-source LLMs and extending the framework. The original paper claims that (1) GovSim enables the study and benchmarking of emergent sustainable behavior, (2) only the largest and most powerful LLM agents achieve a sustainable equilibrium, while smaller models fail, and (3) agents using universalization-based reasoning significantly improve sus- tainability. To test the second claim, we conducted simulations with the small open-source models used in the original study. Additionally, by running the same experiments with small SOTA DeepSeek models, we successfully achieved a sustainable equilibrium. This contradicts the original claim, suggesting that recent advances in LLMs have improved the cooperation abilities of small LLMs. Regarding the third claim, our results confirm that universalization-based reasoning improves performance in the GovSim environment, sup- porting the third claim of the author. However, further analysis suggests that the improved performance primarily stems from the numerical instructions provided to agents rather than the principle of universalization itself.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yaodong_Yang1
Submission Number: 4296
Loading