# Robust Deep Reinforcement Learning with Randomized Smoothing
Our code is based on [SADQN](https://github.com/chenhongge/SA_DQN) and [SAPPO](https://github.com/huanzhang12/SA_PPO). We uses the [auto_Lirpa](https://github.com/Verified-Intelligence/auto_LiRPA) library for computing convex relaxations of neural networks.
First clone our repository. There will be two folders of our DSDQN and ASPPO implementation.
## DSDQN (Denoised - Smoothed Deep Q Network)
### Setup
```
cd DSDQN
git clone https://github.com/KaidiXu/auto_LiRPA
cd auto_LiRPA
git checkout f4492caea9d7f1e6bcee52e70dbcda6b747f43da
python setup.py install
cd ..
pip install -r requirements.txt
```
### Pretrained Models
Our pretrained models can be found in `models/`. `*_denoiser.pth` are the models trained with Denoised Smoothing.
We set the smoothing varaince sigma to 0.1 for Pong, 0.12 for Freeway, and 0.1 for RoadRunner.
(Note that we do not provide the pretrained model of our DSRAD, DSCVX, and DASDQN implementations because of the space limit of supplementary materials.)
### Our new PGD Attack
To test our agents of DSDQN against our new l_inf PGD attack with budget epsilon=0.01 in Pong environment, run
```
python test_attack_denoiser.py --config config/Pong_denoiser.json test_config:m=100 test_config:sigma=0.1 test_config:smooth=true test_config:max_frames_per_episode=50000 test_config:attack_config:params:epsilon=0.01 test_config:attack_config:norm_type=l_inf test_config:num_episodes=5 training_config:use_async_env=false
```
To test our agents of DSRAD against our new l_inf PGD attack with budget epsilon=0.01 in Pong environment, run
```
python test_attack_denoiser.py --config config/Pong_rad_denoiser.json test_config:m=100 test_config:sigma=0.1 test_config:smooth=true test_config:max_frames_per_episode=50000 test_config:attack_config:params:epsilon=0.01 test_config:attack_config:norm_type=l_inf test_config:num_episodes=5 training_config:use_async_env=false
```
To test our agents of DSCVX against our new l_inf PGD attack with budget epsilon=0.01 in Pong environment, run
```
python test_attack_denoiser.py --config config/Pong_cov_denoiser.json test_config:m=100 test_config:sigma=0.1 test_config:smooth=true test_config:max_frames_per_episode=50000 test_config:attack_config:params:epsilon=0.01 test_config:attack_config:norm_type=l_inf test_config:num_episodes=5 training_config:use_async_env=false
```
To test our agents of DASDQN against our new l_inf PGD attack with budget epsilon=0.01 in Pong environment, run
```
python test_attack_denoiser.py --config config/Pong_denoiser_adv.json test_config:m=100 test_config:sigma=0.1 test_config:smooth=true test_config:max_frames_per_episode=50000 test_config:attack_config:params:epsilon=0.01 test_config:attack_config:norm_type=l_inf test_config:num_episodes=5 training_config:use_async_env=false
```
Set `test_config:attack_config:norm_type=l_2` to switch L-infinity attack to L-2 attack. `sigma=0.1` is the smoothing level (should be set to 0.1 for Pong, 0.12 for Freeway, and 0.1 for RoadRunner), `m=100` is the number of Monte Carlo samples, and `epsilon=0.01` is the attack budget. 
### Certified radius
To evaluate the certified radius of each action of DSDQN, run
```
python certify_r_denoiser.py --config config/Pong_denoiser.json test_config:m=100 test_config:sigma=0.1 test_config:smooth=true test_config:max_frames_per_episode=500 test_config:num_episodes=1
```
To evaluate the certified radius of each action of DSRAD, run
```
python certify_r_denoiser.py --config config/Pong_rad_denoiser.json test_config:m=100 test_config:sigma=0.1 test_config:smooth=true test_config:max_frames_per_episode=500 test_config:num_episodes=1
```
To evaluate the certified radius of each action of DSCVX, run
```
python certify_r_denoiser.py --config config/Pong_cov_denoiser.json test_config:m=100 test_config:sigma=0.1 test_config:smooth=true test_config:max_frames_per_episode=500 test_config:num_episodes=1
```
To evaluate the certified radius of each action of DASDQN, run
```
python certify_r_denoiser.py --config config/Pong_denoiser_adv.json test_config:m=100 test_config:sigma=0.1 test_config:smooth=true test_config:max_frames_per_episode=500 test_config:num_episodes=1
```
### Reward lower bound
To evaluate the reward lower bound of DSDQN, run
```
python test_attack_denoiser.py --config config/Pong_denoiser.json test_config:m=1 test_config:sigma=0.1 test_config:smooth=true test_config:max_frames_per_episode=50000 test_config:attack_config:params:epsilon=0.0 test_config:attack_config:norm_type=l_inf test_config:num_episodes=1000 training_config:use_async_env=false
```
To evaluate the reward lower bound of DSRAD, run
```
python test_attack_denoiser.py --config config/Pong_rad_denoiser.json test_config:m=1 test_config:sigma=0.1 test_config:smooth=true test_config:max_frames_per_episode=50000 test_config:attack_config:params:epsilon=0.0 test_config:attack_config:norm_type=l_inf test_config:num_episodes=1000 training_config:use_async_env=false
```
To evaluate the reward lower bound of DSCVX, run
```
python test_attack_denoiser.py --config config/Pong_cov_denoiser.json test_config:m=1 test_config:sigma=0.1 test_config:smooth=true test_config:max_frames_per_episode=50000 test_config:attack_config:params:epsilon=0.0 test_config:attack_config:norm_type=l_inf test_config:num_episodes=1000 training_config:use_async_env=false
```
To evaluate the reward lower bound of DASDQN, run
```
python test_attack_denoiser.py --config config/Pong_denoiser_adv.json test_config:m=1 test_config:sigma=0.1 test_config:smooth=true test_config:max_frames_per_episode=50000 test_config:attack_config:params:epsilon=0.0 test_config:attack_config:norm_type=l_inf test_config:num_episodes=1000 training_config:use_async_env=false
```
and then, run
```
python reward_bound.py
```
### Training
To train DSDQN in Pong environment, run
```
python train_denoiser.py --config config/Pong_denoiser.json
```
To train DSRAD in Pong environment, run
```
python train_denoiser.py --config config/Pong_rad_denoiser.json
```
To train DSCVX in Pong environment, run
```
python train_denoiser.py --config config/Pong_cov_denoiser.json
```
To train DASDQN in Pong environment, run
```
python train_denoiser_adv.py --config config/Pong_denoiser_adv.json
```
The result will be save to `PongNoFrameskip-v4_denoiser_0.1`. You can find the denoiser model with name `denoiser_frame_300000.pth`, and the original DQN model used for training denoiser `dqn_frame_300000.pth`.
## ASPPO (Adversarial - Smoothed Proximal Policy Optimization)
### Setup
```
cd ASPPO
git clone https://github.com/KaidiXu/auto_LiRPA
cd auto_LiRPA
git checkout 389dc72fcff606944dca0504cc77f52fef024c4e
python setup.py install
cd ..
pip install -r requirements.txt
cd src
```
Then, follow the instruction [here](https://github.com/openai/mujoco-py#install-mujoco) to install mujoco
### Pretrained Models
Our pretrained models can be found in `walker_0.2_models/`, `hopper_0.3_models/`, and `humanoid_0.4_models`. Each folder contains 15 pretrained models. We report the median result of the 15 models while testing.
Note that PPO learning algorithms have large variance across different training runs. For a fair comparison, it is necessary to train each environment at least 15 times and report the median reward among the 15 agents.
We set the smoothing variance sigma to 0.2 for Walker, 0.3 for Hopper, and 0.4 for Humanoid during the training.
### Attack
To test the clean reward of our first agent of ASPPO in Walker environment, run
```
python test.py --config-path config_walker_adv.json --load-model walker_0.2_models/walker_smoothing_adv_1.model --deterministic
```
To attack our first agent of ASPPO with random attack, critic attack, MAD(Maximal Action Difference) attack, and RS(robust Sarsa) attack in Walker environment, run
```
source scan_attacks.sh
scan_attacks walker_0.2_models/walker_smoothing_adv_1.model config_walker_adv.json sarsa_walker_adv_1
```
To evaluate other agents, change the corresponding number of the model name. For example, `walker_0.2_models/walker_smoothing_adv_2.model`, `sarsa_walker_adv_2`.
### ADIV
To get the ADIV of our first agent of ASPPO in Walker environment, run
```
python test.py --config-path config_walker_adv.json --load-model walker_0.2_models/walker_smoothing_adv_1.model --deterministic
```
### Reward lower bound
To evaluate the reward lower bound of our first agent of ASPPO in Walker environment, run
```
python test.py --config-path config_walker_adv.json --load-model walker_0.2_models/walker_smoothing_adv_1.model --num-episodes 1000 --testing-m 1 --deterministic
```
and then, run
```
python lower_bound.py
```
### Training
To train ASPPO in walker environment, run
```
python run.py --config-path config_walker_adv.json
```
The models will be saved to `adv_ppo_walker/agents/<expriment ID>/smoothing_checkpoint/`.
