# Instructions to Run the Code

### Part 1: Experiments for known model setting

#### Running normal-form Games
```python
# Stag Hunt
python run_NormalTraining.py --game StagHunt -K 2000000 --beta 25. --algo PPO -T 500 --lr 0.01 --obj MinGap --seed ...

python plot_trajectory.py --game StagHunt --obj MinGap -K 2000000 -T 500 --beta 25.0 --model-type Normal --grid-number 10

# Matching Pennies
python run_NormalTraining.py --game ZeroSum -K 2000000 --beta 25. --algo PPO -T 500 --lr 0.01 --obj Nash --seed ...

python plot_trajectory.py --game ZeroSum --obj MinGap -K 2000000 -T 500 --beta 25.0 --model-type Normal --grid-number 10
```

#### Running the grid-world version of Stag Hunt
```python
# training
python StagHunt_GridWorld/run_game.py --train-batch-size 128 --steer-epochs-per-update 50 --lr 0.001 --beta 25 --seed ...

# evaluation
python StagHunt_GridWorld/evaluation.py --train-batch-size 128 --steer-epochs-per-update 50 --lr 0.001 --beta 25 --seed ...
```

### Part 2: Run the experiments for unknown model setting with small model set
#### Step 1: Training
```python
# oracle policy for f_{mu=0.7}
python run_Exp_SmallModel.py --game StagHunt -K 2000000 --beta 70.0 --algo PPO -T 500 --lr 0.01 --obj MinGap --mu 0.7 --sigma 0.3 --model-type Gaussian_lr --seed ...

# oracle policy for f_{mu=1.0}
python run_Exp_SmallModel.py --game StagHunt -K 2000000 --beta 20.0 --algo PPO -T 500 --lr 0.01 --obj MinGap --mu 1.0 --sigma 0.3 --model-type Gaussian_lr --seed ...

# belief state based policy
python run_Exp_SmallModel.py --game StagHunt -K 2000000 --beta 70.0 20.0 --algo PPO -T 500 --lr 0.01 --obj MinGap --mu 0.7 1.0 --sigma 0.3 --model-type Gaussian_lr --seed ...

```


### Part 3: Run the experiments for unknown model setting with large model set
#### Step 1: Train exploration policy
```python
# we treat 0.0 as +infty
python run_Strategic_Explore.py --game MP_Cooperative --num-players 10 -K 5000000 --beta 100.0 --algo PPO -T 30 --lr 0.01 --obj Explore --model-type ValueAware --sigma 0.5 --shift 0.0 0.5 1.0 1.5 --seed ...
```

#### Step 2: Evaluate random exploration strategy
```python
python run_Random_Explore.py -T 0.1 0.2 0.3 0.5 1.0 2.0 3.0 --num-eval 100 --game MP_Cooperative --num-players 10 --lr 0.01 --model-type ValueAware --shift 0.0 0.5 1.0 1.5 --act-dim 1
```

#### Step 3: Compare Oracle and FETE
```python
# run the oracle policy
python run_Exp_LargeModel.py --game MP_Cooperative --num-players 10 -K 2000000 --beta 10.0 --algo PPO -T 500 --lr 0.01 --obj Nash --sigma 0.5 --shift ... --explore-steps 0 --seed ...

# run the exploit policy
python run_Exp_LargeModel.py --game MP_Cooperative --num-players 10 -K 2000000 --beta 10.0 --algo PPO -T 470 --lr 0.01 --obj Nash --sigma 0.5 --shift ... --explore-steps 0 --seed ...
```
