# README

This is the implementation of the ICLR 4064 submission "COPA: Certifying Robust Policies for Offline Reinforcement Learning against Poisoning Attacks". The code is adapted on the basis of the offline RL training repo https://github.com/google-research/batch_rl.

Basically, we provide two certification (**per-state action certification** and **reward certification**) for three aggregation protocols (**PARL, TPARL, DPARL**). Below we present the example commands for running these certifications.

## Certifying Per-State Action

1. PARL

```bash
python -um batch_rl.fixed_replay.test \
			 --base_dir [base_dir] --model_dir [model_dir] \
			 --total_num 50 --max_steps_per_episode 1000 \
			 --agent_name dqn \                                                                                                                             			 --gin_files='copa/fixed_replay/configs/dqn.gin' \                                                                                                                                 			  --gin_bindings='atari_lib.create_atari_environment.game_name = "Freeway"'
```

where `base_dir` is the path for storing experimental logs and results, and `model_dir` is the path of trained $u$ subpolicies.

2. TPARL

```bash
python -um batch_rl.fixed_replay.test \
			 --base_dir [base_dir] --model_dir [model_dir] \
			 --cert_alg window --window_size 4 \
			 --total_num 50 --max_steps_per_episode 1000 \
			 --agent_name dqn \                                                                                                                             			 --gin_files='copa/fixed_replay/configs/dqn.gin' \                                                                                                                                 			  --gin_bindings='atari_lib.create_atari_environment.game_name = "Freeway"'
```

For TPARL, we explicitly pass the `cert_alg` option as `window` and configure the predetermined window size $W$.

3. DPARL

```bash
python -um batch_rl.fixed_replay.test \
			 --base_dir [base_dir] --model_dir [model_dir] \
			 --cert_alg dynamic --max_window_size 5 \
			 --total_num 50 --max_steps_per_episode 1000 \
			 --agent_name dqn \                                                                                                                             			 --gin_files='copa/fixed_replay/configs/dqn.gin' \                                                                                                                                 			  --gin_bindings='atari_lib.create_atari_environment.game_name = "Freeway"'
```

For DPARL, we explicitly pass the `cert_alg` option as `dynamic` and configure the maximum window size $W_{\rm max}$.

## Certifying Cumulative Reward

1. PARL

```bash
python -um batch_rl.fixed_replay.test_reward \
			 --base_dir [base_dir] --model_dir [model_dir] \
			 --total_num 50 --max_steps_per_episode 1000 \
			 --agent_name dqn \                                                                                                                             			 --gin_files='copa/fixed_replay/configs/dqn.gin' \                                                                                                                                 			  --gin_bindings='atari_lib.create_atari_environment.game_name = "Freeway"'
```

where `base_dir` is the path for storing experimental logs and results, and `model_dir` is the path of trained $u$ subpolicies.

2. TPARL

```bash
python -um batch_rl.fixed_replay.test_reward \
			 --base_dir [base_dir] --model_dir [model_dir] \
			 --cert_alg window --window_size 4 \
			 --total_num 50 --max_steps_per_episode 1000 \
			 --agent_name dqn \                                                                                                                             			 --gin_files='copa/fixed_replay/configs/dqn.gin' \                                                                                                                                 			  --gin_bindings='atari_lib.create_atari_environment.game_name = "Freeway"'
```

For TPARL, we explicitly pass the `cert_alg` option as `window` and configure the predetermined window size $W$.

3. DPARL

```bash
python -um batch_rl.fixed_replay.test_reward \
			 --base_dir [base_dir] --model_dir [model_dir] \
			 --cert_alg dynamic --max_window_size 5 \
			 --total_num 50 --max_steps_per_episode 1000 \
			 --agent_name dqn \                                                                                                                             			 --gin_files='copa/fixed_replay/configs/dqn.gin' \                                                                                                                                 			  --gin_bindings='atari_lib.create_atari_environment.game_name = "Freeway"'
```

For DPARL, we explicitly pass the `cert_alg` option as `dynamic` and configure the maximum window size $W_{\rm max}$.
