# Bisimulation for Fairness in RL
## Installation
* This code requires `fair_gym` which is also submitted in the supplemental material, so first install that package.
* Then install the requirements of this package: 
```cmd
pip install -r requirements.txt
```

## Usage
### Lending Environment
* To train PPO+Bisimulator on the lending environment use the following command. You can set the `--agent` argument to the following values: 
    * PPO+Bisimulator: `bisim_rew_dyn`
    * PPO+Bisimulator (Reward only): `bisim_rew`
    * Standard PPO: `ppo`
    * Max-util: `max_util`
    * Equality of Oppurtunity (EO): `eo`
    ```cmd
    python main.py --env lending --agent bisim_rew_dyn --max_episode_steps 10000 --rew_coef 5 --anneal_lr --clip_vloss
    ```
* To train DQN+Bisimulator on the lending environment use the following command.  You can set the `--agent` argument to the following values: 
    * DQN+Bisimulator: `dqn_bisim_rew_dyn`
    * DQN+Bisimulator (Reward only): `dqn_bisim_rew`
    * Standard DQN: `dqn`
    ```cmd
    python main_dqn.py --env lending --agent dqn_bisim_rew_dyn --max_episode_steps 10000 --rew_coef 1.5 --anneal_lr
    ```
* We use separate scripts to train the other baselines as they require a different training loop: 
    * To train ELBERT-PO use:
    ```cmd
    python train_elbert.py --env lending --agent elbert --max_episode_steps 10000 --anneal_lr --clip_vloss
    ```
    * To train Lagrangian PPO (Lag-PPO) use: 
    ```cmd
    python train_lagppo.py --env lending --agent lagppo --max_episode_steps 10000 --anneal_lr --clip_vloss
    ```
    * To train Advantage regularized PPO (A-PPO) use: 
    ```cmd
    python train_appo.py --env lending --agent appo --max_episode_steps 10000 --anneal_lr --clip_vloss
    ```

### College Admission Environment
* To train PPO+Bisimulator on the college environment use the following command. You can set the `--agent` argument to the following values:
    * PPO+Bisimulator: `bisim_rew_dyn`
    * PPO+Bisimulator (Reward only): `bisim_rew`
    * Standard PPO: `ppo`
    * Supervised Learning Classifier: `classifier`
    ```cmd
    python main.py --env college --agent bisim_rew_dyn --max_episode_steps 1000 --rew_coef 5 --anneal_lr --clip_vloss
    ```
* To train DQN+Bisimulator on the college environment use the following command.  You can set the `--agent` argument to the following values: 
    * DQN+Bisimulator: `dqn_bisim_rew_dyn`
    * DQN+Bisimulator (Reward only): `dqn_bisim_rew`
    * Standard DQN: `dqn`
    ```cmd
    python main_dqn.py --env college --agent dqn_bisim_rew_dyn --max_episode_steps 1000 --rew_coef 1.5 --anneal_lr
    ```
* We use separate scripts to train the other baselines as they require a different training loop: 
    * To train ELBERT-PO use:
    ```cmd
    python train_elbert.py --env college --agent elbert --max_episode_steps 1000 --anneal_lr --clip_vloss
    ```
    * To train Lagrangian PPO (Lag-PPO) use: 
    ```cmd
    python train_lagppo.py --env college --agent lagppo --max_episode_steps 1000
    ```
    * To train Advantage regularized PPO (A-PPO) use: 
    ```cmd
    python train_appo.py --env college --agent appo --max_episode_steps 1000
    ```
