## Contents
- [Installation](#installation)
- [APIs](#apis)
- [Lmgame Bench](#lmgame-bench)
  - [Setup](#setup)
  - [Single-Model Performance](#single-model-performance)
  - [Agentic Performance](#agentic-performance)
  - [Understanding Game Performance](#understanding-game-performance)

## Installation

Install dependency:
```
conda create -n lmgame python==3.10 -y
conda activate lmgame
pip install -e .
```

## APIs

Currently we support gaming agents based on the following models:

- OpenAI:
  - o3-mini, o3
  - o1
  - gpt-4o
  - gpt-4o-mini
- Anthropic:
  - claude-3-7-sonnet (with thinking mode)
  - claude-3-5-haiku, claude-3-5-sonnet
- Gemini:
  - gemini-2.5-pro, gemini-2.5-flash
  - gemini-2.0-flash-thinking-exp
  - gemini-2.0-pro, gemini-2.0-flash
  - gemini-1.5-pro
- xAI:
  - grok-3-mini
- Deepseek:
  - reasoner (R1)
  - chat (V3)
 
To test the models yourself, set your API keys in `credentials.sh` with:

```
export OPENAI_API_KEY={YOUR_OPENAI_API_KEY}
export ANTHROPIC_API_KEY={YOUR_ANTHROPIC_API_KEY}
export GEMINI_API_KEY={YOUR_GEMINI_API_KEY}
export XAI_API_KEY={YOUR_XAI_API_KEY}
export DEEPSEEK_API_KEY={YOUR_DEEPSEEK_API_KEY}
```

⚠️ **Evaluating or deploying the agents with high-end models could incur higher costs!**

## Lmgame Bench

### Setup

#### Gym and Retro Interface

##### Gymnasium Envrionments

We standardize our gaming envrionment interfaces following [Gymnasium](https://github.com/Farama-Foundation/Gymnasium).

Currently our evaluation suite composes of the following games using gym envrionments:

- Sokoban
- Tetris
- 2048
- Candy Crush

Most games are runnable out-of-the-box with no additional setup. For Pokemon Red, you need to place the ROM file in the designated directory:

##### Retro Envrionments

Stable [Retro](https://github.com/Farama-Foundation/stable-retro) is a library that enables classic video game emulation through a wide range of supported systems, providing a standardized interface via Gymnasium.

To run classical games implemented on Retro, you need to legally obtain the games files and import them with [this instruction](https://stable-retro.farama.org/getting_started/):


```
python3 -m retro.import /path/to/your/ROMs/directory/
```


Currently, our evaluation suite includes the following games from Retro environments:
- Super Mario Bros 1985


We have also integrated additional Retro environments that are not included in stable-retro.
For these games, no `retro.import` is required. To enable the envrionments, simply place the ROM file into the designated directory.

For example, for Ace Attorney: Phoenix Wright, place the ROM file into:
```
gamingagent/envs/retro_02_ace_attorney/AceAttorney-GbAdvance
```

Additional games we integrated:
- Ace Attorney: Phoenix Wright

### Single-Model Performance

Launch multiple evaluation instances (in parallel) for a model on different games with the following commands:

```
python3 lmgame-bench/run.py --model_name {model_name} --game_names {list_of_games} --harness_mode false
```

To multiple models in parallel, run the following script:

```
bash lmgame-bench/evaluate_all.sh
```

### Agentic Performance

Evaluate a model's performance in gaming agent (with gaming harness support), run the following command:

```
python3 lmgame-bench/run.py --model_name {model_name} --game_names {list_of_games} --harness_mode true
```

##### Command options

```
--harness_mode: if to evaluate the model using agentic workflow, choice of ["true", "false", "both"].
--max_parallel_procs: max parallel instances to run.
--game_names: list of games to evaluated on, e.g. "sokoban,tetris,candy_crush,twenty_forty_eight".

Currently supported games:
- sokoban
- tetris
- candy_crush
- twenty_forty_eight
- super_mario_bros
- ace_attorney
```

### Customize Your Settings

`run.py` launches multiple instances of `single_agent_runner.py`. To run single model in a single game setting, run `python3 lmgame-bench/single_agent_runner.py --game_name {game_name} --model_name {model_name} --config_root_dir {path_to_gaming_agent_config} (--harness)`. 

Adjust gaming-agent related configurations in `gamingagent/configs/{game_env_dir}/config.yaml`. 

Propmts can be found in `gamingagent/configs/{game_env_dir}/module_prompts.json`.
