This repository contains the codebase used to perform the experiments in **Differentially Private Deep Model-Based Reinforcement Learning**.

The codebase is based on the following PyTorch re-implementation of MOPO: https://github.com/junming-yang/mopo .

# Dependencies

- Mujoco 2.0
- Gym 0.25.0
- PyTorch 1.11.0
- DM Control 1.0.22 (```pip install dm_control```)
- dm2gym (https://github.com/zuoxingdong/dm2gym)

# Usage

## Train an agent

Launch the command lines below to train a private RL agent with PriMORL. The argument ```--noise-multiplier``` controls
the strength of the privacy (higher noise multiplier means stronger the privacy), and the argument ```--max-grad-norm```
controls the global clipping norm C.

```
# Cartpole-Balance
python train.py --task dm2gym:CartpoleBalance-v0 --rollout-length 20 --reward-penalty-coef 2.0 --hold-out-ratio 0.1 --epoch 60 --poisson-q 0.01 --max-grad-norm 0.01 --noise-multiplier 0.25 --load-dataset --seed 0

# Cartpole-Swingup
python train.py --task dm2gym:CartpoleSwingup-v0 --rollout-length 20 --reward-penalty-coef 2.0 --hold-out-ratio 0.1 --epoch 60 --poisson-q 0.01 --max-grad-norm 0.01 --noise-multiplier 0.25 --load-dataset --seed 0

# Pendulum
python train.py --task Pendulum-v1 --rollout-length 30 --reward-penalty-coef 2.0 --hold-out-ratio 0.1 --epoch 60 --poisson-q 0.01 --max-grad-norm 0.01 --noise-multiplier 0.25 --load-dataset --seed 0
```

## Datasets

In the paper, we collect and use very large datasets (about 30K episodes) to study private RL. Because of the size of 
these datasets, we provide smaller, already pre-processed datasets (1K to 3K episodes) to perform experiments in the
```datasets``` folder. To ensure that the algorithms works just similarly as in the paper, we need to take this into
account by increasing the sampling ratio (command argument ```--poisson-q```) from 0.001 to 0.01 to keep the same amount
of sampled episodes at each training round, as well as the handout ratio from 0.01 to 0.1. The data collection process is described in the paper. 

## Logging, saving and loading models

Results will be saved in the ```log``` folder by default. Also by default, the dynamics model will be saved automatically
in the ```exp_name/models/ite_dynamics_model``` folder of the corresponding experiment folder.

To re-use an already trained model and skip directly to policy training, save the model in the folder
```./models/saved_models``` under the name ```model_name.pt```, and add the flag 
```--load-model-name model_name``` to the ```python train.py``` command line.

## Compute privacy guarantees

The following command line computes the privacy budget as reported in the paper.

```
python dp_accountant.py --noise_multiplier 0.25 --sampling_method poisson --delta 1e-5 --sampling_ratio 0.001 --n_rounds 5000
```