
<a name="readme-top"></a>




<br />
<div align="center">

<h3 align="center">Strategic Insights: Evaluating Large Language Models' Decision-Making in Multi-Player Game-Theoretic Environments
</h3>


</div>



<!-- TABLE OF CONTENTS -->
<details>
  <summary>Table of Contents</summary>
  <ol>
    <li><a href="#System Requirements">System Requirements</a></li>
    <li><a href="#Original Data">Original Data</a></li>
    <li><a href="#Processed Data Provided to AI">Processed Data Provided to AI</a>
    </li>
    <li>
      <a href="#Complete Communication History between Human Author(s) and AI">Complete Communication History between Human Author(s) and AI</a>
    </li>
    <li><a href="#Finalized Jupyter Notebook based on Code Generated by AI">Finalized Jupyter Notebook based on Code Generated by AI</a></li>
    </li>
  </ol>
</details>



## System Requirements

- Install dependencies according to `./requirements.txt`.
- The finalized executable Jupyter notebook, based on code generated by the AI, can be run on a free-tier Google Colab instance, with a total execution time of under 30 minutes.



## Original Data

The original code and data, i.e., measuring LLMs' Gaming Ability in Multi-Agent environments on the twelve language agents can be found at: https://github.com/CUHK-ARISE/GAMABench, released along with the published paper: [Huang, et al., 2025](https://openreview.net/forum?id=DI4gW8viB6).


<p align="right">(<a href="#readme-top">back to top</a>)</p>


## Processed Data Provided to AI

Our processed results (JSON files) provided to AI are under the `processed_results` directory:

The results are organized into six directories, corresponding to the six games, i.e.,

| Game Name                | Directory Name |
|--------------------------|----------------|
| Guess 2/3 of the Average | processed_results/guessing_game  |
| Divide the Dollar        | processed_results/divide_dollar  |
| Public Goods Game        | processed_results/public_goods   |
| Diner's Dilemma          | processed_results/diner_dilemma  |
| Battle Royale            | processed_results/battle_royale  |
| Pirate Game              | processed_results/pirate_game    |

Each JSON file is named according to the rule: `<LLM id>_<game directory>_v1_run<run>.json`, where:

- `<LLM id>` is the one particular single LLM under investigation in each run and could be one of the twelve: `<JLI84K7>`, `<my9FQ38>`, `<gvHK3Q2>`, `<Z3cCMo0>`, `<qqOBXB0>`, `<3P26cpI>`, `<X9x73kd>`, `<HzpuDbC>`, `<jHLiFlg>`, `<RfelEFA>`, `<xoEciVX>`, or `<pKbLE9I>`,
- `<game directory>` is the game directory, representing one of the six games, and could be one of `guessing_game`, `divide_dollar`, `public_goods`, `diner_dilemma`, `battle_royale`, or `pirate_game`,
- `<run>` is the run trial number and could be one of 1, 2, 3, 4, or 5.


<p align="right">(<a href="#readme-top">back to top</a>)</p>


## Complete Communication History between Human Author(s) and AI

The complete communication history between human author(s) and AI, including all the prompts, thinking, and responses, is organized in:

- `prompts_and_responses.md`



<p align="right">(<a href="#readme-top">back to top</a>)</p>


## Finalized Jupyter Notebook based on Code Generated by AI

The finalized Jupyter notebook to reproduce our results based on the code generated by AI is:

- `reproducing_results.ipynb`

This finalized version is based on iterations of debugging and improvements carried out primarily by AI, and the full history is also reported in `prompts_and_responses.md`.



<p align="right">(<a href="#readme-top">back to top</a>)</p>


