Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"
Abstract: As large language models (LLMs) are increasingly deployed in multi-agent systems for cooperative tasks, understanding their decision-making behavior in resource-sharing scenarios becomes critically important for AI safety and governance. This study provides a comprehensive reproducibility analysis and theoretical extension of Piatti et al.'s $\texttt{GovSim}$ framework, which models cooperative behavior in multi-agent systems through the lens of common resource dilemmas. Beyond faithful reproduction of core claims regarding model-size dependencies and universalization principles, we contribute two theoretically-motivated extensions that advance our understanding of LLM cooperation dynamics: (1) $\textbf{Loss aversion in resource framing}$, showing that negative resource framing (elimination of harmful resources) fundamentally alters agent behavior patterns, with models like $\texttt{GPT-4o-mini}$ succeeding in loss-framed scenarios while failing in gain-framed ones, a finding with significant implications for prompt engineering in cooperative AI systems; and (2) $\textbf{Heterogeneous influence dynamics}$, demonstrating that high-performing models can systematically elevate the cooperative behavior of weaker models through communication, enabling resource-efficient deployment strategies. These findings establish fundamental principles for deploying LLMs in cooperative multi-agent systems: cooperation emerges from model capability, resource framing significantly impacts behavioral stability, and strategic model mixing can amplify system-wide performance. Our work provides essential guidance for practitioners designing AI systems where multiple agents must cooperate to achieve shared objectives, from autonomous economic systems to collaborative robotics.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=qpu0mtKfh7&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: ## 1. Major Theoretical Enhancements
### 1.1 Strengthened Theoretical Framework
- Incorporated formal grounding in **behavioral economics** (e.g., prospect theory and loss aversion), **game theory**, and **cognitive science**, providing a rigorous conceptual basis for our experiments.
- Developed **formal theoretical predictions** for each experimental extension, explicitly connecting them to established literature.
- Framed our empirical results within **AI safety and governance**, emphasizing their broader relevance beyond technical performance.
### 1.2 Deeper Analysis of Experimental Extensions
To address the request for expanded experiments and clearer insights, we enriched the analysis of each experimental setting:
- **Cross-architectural generalization**: Added a more detailed comparisons between **Mixture-of-Experts (MoE)** and **dense models**, examining their effects on cooperative behavior and why this comparison is important.
- **Cross-linguistic consistency**: Strengthened the motivation by linking it to research on **cultural biases in AI systems** and challenges in **global deployment**.
- **Loss aversion framing**: Introduced a robust theoretical foundation based on **Kahneman & Tversky’s prospect theory**, highlighting behavioral differences under loss versus gain framing.
- **Heterogeneous influence dynamics**: Expanded our analysis to include **peer influence effects** and formalized how resource allocation impacts cooperation.
---
## 2. Stronger Framing and Broader Impact
We revised the manuscript to better highlight the broader implications and clearly position our work within ongoing debates in AI research and practice.
### 2.1 Audience Alignment and Impact
- Aligned our contributions with major challenges in **multi-agent AI**, including **autonomous economic systems**, **collaborative robotics**, and **distributed intelligence**.
- Enhanced the discussion on **ethical considerations** and **AI safety**, ensuring responsible framing of scenarios and their implications.
### 2.2 Comprehensive Literature Integration
- Included foundational references to support our theoretical and experimental contributions:
- *Kahneman & Tversky (1979)* – Prospect Theory
- *Fehr & Gächter (2000)* – Reciprocity in economics
- *Henrich et al. (2001)* – Cultural fairness norms
- *Russell (2019)* – Human-compatible AI
- More thoroughly integrated our findings with literature on **behavioral AI** and **multi-agent systems**, reinforcing the novelty and relevance of the work.
Assigned Action Editor: ~Marc_Lanctot1
Submission Number: 4983
Loading