# Code for the paper __Mirror Descent Actor Critic via Bounded Advantage Learning__

### Installation
Use `pdm` to install the environment
```
$ pdm install
```

### Gridworld Experimrnt
First get an estimate of optimal regularized value
```
$ cd toy
$ pdm run python3 run_tabular.py --run --record_frequency=5 --seed=0 --algo=imdvi --psi_opt_filename=default --initial_psi=zero --alpha=0.02 --beta=0.99 --num_iterations=300
```
and then
```
pdm run python3 run_tabular.py --run --record_curve --record_frequency=5 --seed=0 --algo=bal --bound_f=rtanh --bound_g=rtanh --initial_psi=random --alpha=0.02 --beta=0.99 --num_iterations=300 --psi_opt_filename=default=./result/zero_imdvi_gam0.99_alpha0.02_beta0.99_nitr2000_randE0.1_berr0.0_nPE1000_dPE0.0001_f5/seed0/psi_final.npy'
```


### Mujoco Experiment
```
$ pdm run python3 train_online_mujoco.py --env=Ant-v4 --algo=mdac --bound_f=rclip --bound_g=rclip
```

### DMC dog Experiment
```
$ pdm run python3 train_online_mujoco.py --env=dog-walk --algo=mdac --bound_f=rclip --bound_g=clip_t
```

### Adroit Experiment
```
$ pdm run python3 train_online_adroit.py --env=AdroitHandPen-v1 --algo=mdac --bound_f=rclip --bound_g=clip_t
```
