# Zero-Sum Positional Differential Games as a Framework for Robust Adversarial Reinforcement Learning: Deep Q-Learning Approach

The repository is an official implementation of the experiments from the paper "Zero-Sum Positional Differential Games as a Framework for Robust Adversarial Reinforcement Learning: Deep Q-Learning Approach". 

## Requirements 
 
For running experements, you need python 3.9.16. To install requirements, run the command   
    
```    
pip install -r requirements.txt    
```

## Experiment scheme

We consider the following training and evaluation scheme in our experiments

![Evaluation scheme](pics/pic_scheme.png)

In the first stage, agents learn (decentralized or centralized, depending on an algorithm). In the second stage, we fix the trained first agent's policy $\pi_u$ and solve the obtained single-agent Reinforcement Learning problem from the point of the second agent's view using various baseline Reinforcement Learning algorithms with various hyperparameters. After that, we choose the maximum value of the quality index (sum of rewards) in these running and put it into the array *maximum values of the quality index*. We believe this maximum value approximates the guaranteed result value $V_u^{\pi_u}$. The third step is symmetrical to the previous one and is aimed at obtaining an estimate for $V_v^{\pi_v}$. We repeat these three steps 5 times, accumulating *maximum values of the quality index* and *minimum values of the quality index* arrays. Then, we illustrate the data of these arrays as shown in the figure above. The boldest bar describes the best guaranteed results of the agents out of 5 runnings, the middle bar gives us the mean values, and the faintest bar shows the worst results in 5 runnings. Thus, looking at such a visualization, we can make conclusions about the algorithm's efficiency, robustness, and stability.

## Running

To start one set of experiments, run the command  
    
```    
python run.py --config_path <path to config_file>  
```

For example,
```    
python run.py --config configs/config_example.json
```

The experiments' results are written to the folder specified by the key *result_path* in config file. In the case of <code>config_example.json</code>, this folder is <code>data_example</code>.

#### **Config file structure** 

The config file is presented as a json file with the following 5 required components:

- *result_path* - the path of the experiment results;
- *seeds* - the list of seeds which are used in the runnings;
- *envs* - the list of environments (games). Each environment is described by a dict with keys representing its parameters of the environment, two of which are required:
    - *env_name* - the name of the environment;
    - *timestaps* - the number of timestaps for learning;
- *learning* - the list of two-agent RL algorithms. Each algorithm is described by a dict with keys representing its parameters, the following of which are required:
    - *alg_name* - the name of the algorithm;
    - *action_n* - the size of the action space discretization (required for 2xDDQN, MADQN, CounterDQN, NashDQN, IDQN, DIDQN). This parameter should be a number $m^n$, where $n$ is a dimension of the action space;
    - *subalg_name* - the name of the single-agent RL algorithm in alternately agents' learning (required for RARL);
    - *subtimesteps* - the number of timestaps for each agent in alternately learning (required for RARL);
- *testing* - the list of single-agent RL algorithms for evaluation stage. Each algorithm is described by a dict with keys representing its parameters, the following of which are required:
    - *alg_name* - the name of the algorithm;
    - *action_n* - the size of the action space discretization (required for DDQN). This parameter should be a number $m^n$, where $n$ is a dimension of the action space.

#### **Possible values for the required keys in a configfile**

*env_name* possible values are
- EscapeFromZero
- GetIntoCircle
- GetIntoSquare
- HomicidalChauffeur
- Interception
- InvertedPendulum
- Swimmer
- HalfCheetah

*alg_name* possible values for *learning* are
- 2xDQN
- RARL
- NashDQN
- MADDPG
- MADQN
- CounterDQN
- IDQN
- DIDQN

*alg_name* possible values for *learning* are
- DDQN
- DDPG
- CEM
- SB3A2C
- SB3PPO
- SB3SAC

### **Paper experements**:

The config files used to obtain the paper results are presented in the <code>configs</code> folder for each environment (game).

To reproduce the experiments from the paper, you need to run the corresponding config files from the <code>configs</code> folder. However, you need to be careful, because one such config runs 40 experiments, which will take 120 hours if you run them sequentially (by default). Therefore, to check the experiment results, you can run the experiment for one seed and one algorithm as shown in <code>configs/config_example.json</code>.

## Visualization

To visualize experiments, run the command 

```    
python run.py --data_path <path to data> --exp_subname <string that should be in the experiment name> --show_tests <0 by default or 1>
```

For example,
```    
python show.py --data_path data --exp_subname GetIntoSquare
```
or
```    
python show.py --data_path data_example --show_tests 1
```

### Paper results

Visualization for the paper results is presented in the corresponding files in the <code>results</code> folder and in the figure below. For detailed information about the training and evaluating processes, set <code>show_tests=1</code>. 

| ![EscapeFromZero](pics/pic_res_EscapeFromZero.png) | ![GetIntoCircle](pics/pic_res_GetIntoCircle.png) |    
|:----:|:----:|    
| *EscapeFromZero* | *GetIntoCircle* |    
| ![GetIntoSquare](pics/pic_res_GetIntoSquare.png) | ![HomicidalChauffeur](pics/pic_res_HomicidalChauffeur.png) |    
| *GetIntoSquare* | *HomicidalChauffeur* |
| ![Interception](pics/pic_res_Interception.png) | ![InvertedPendulum](pics/pic_res_InvertedPendulum.png) |    
| *Interception* | *InvertedPendulum* |
| ![Swimmer](pics/pic_res_Swimmer.png) | ![HalfCheetah](pics/pic_res_HalfCheetah.png) |    
| *Swimmer* | *HalfCheetah* |
