# PolicyGuard: Towards Test-time Step-level Backdoor Defense for RL Agents




## Code structure

The GP model.
- `DGP_XRL.py`: rnn-based deep kernel learning model. 
Each object has the following functions:
- `train()`: train the approximation model to model the correlation between the input trajectories and the final rewards, it will display the training accuracy. Note it takes as input a weight parameter (tensor), which are the class weights (defalut as None, used for class imbalance).
- `test()`: test the trained model accuracy on the test trajectories.
- `get_x`: compute the GP Variance for the input time step along with the Pseudo Trajectories.
- `save`: save the trained model.
- `load`: load a well trained model.
- please install the required dependency in the ```requirements.txt```. We have installed `Atari` and `Mujoco` games and envs in `src/multiagent-competition` and `src/baby-a3c`. Due to space constraints, please download the weights from their githubs. We stored a trained GP model for RTGH in `src/models_g` 

Key parameters (the instruction of most parameters can be found in the inline comments):
- `encoder_type`: 'CNN' or 'MLP', if the observation is environment frame snapshot (image), use 'CNN', it will use CNN to transform the input observation ([n_traj, seq_len, input_channels, input_dim, input_dim], torch.float32) into the observation encoding ([n_traj, seq_len, encode_dim]). It will also use an embedding layer to transform the categorical action ([n_traj, seq_len], torch.long) into the action embedding. Then, it concatenate the observation encoding and action embedding and output the final hidden representation. Note that this cnn structure is designed for Atari games, if currently only support input_dim=80/84 and do not support continous actions. If using a different input dim, change the '4' in  `self.cnn_out_dim = 4 * 4 * 16 + embed_dim` in line 54 of `rnn_utils.py` to the current encoded dim. If using continous actions, change the embeding layer in line 35 of `rnn_utils.py` to an MLP. if the observation and action are feature vectors, use 'MLP', it will concatenate the observations and actions and then run an MLP.    
- `likelhood_type`: 'classification' or 'regression', if final rewards are discrete, using 'classification', otherwise using 'regression'.
- `rnn_cell_type`: 'GRU' or 'LSTM', default as 'GRU' for better efficiency.
- `hiddens`: MLP structure or the RNN hidden dim in the CNN+RNN, suggest using the policy network structure and keep it the same for all the explainers.

## Usage -  workflow

- Step 1: set up the game env, load the pretrained agent, and collect trajectories by running the agent in the environment.
  - Note 1: Run and save the trajectories when collecting them at the first time and load the collected traj for future usages (make sure all the models are trained on the same set of trajectories).

  - Note 3: Save the original observations and actions in `src/traj_dat/`.
  - Note 4: the trajectories have varied lengthes, pad them into the same length: pad at the front, not the end; pad with 0, it will cause confusion for rewards and categorial actions).
  - Note 5: control the traj length with some parameter like `max_ep_len` and discard the trajs that do not finish at the maximum length.
  - Note 6: save every traj with a `.npy` file
- Step 3: load and preprocess the trajectories. 
  - Note 1: change the padded values in obs with `0` and preprocess them into states using the policy network preprocessing method.
- Step 4: Train the GP Model with `python src/train_exp.py` 
- Step 5: run our method  with: `python src/get_x.py` 

- Hint: Adjust dims and configurations to Atari Game follows `atari_run/train_exp.py`.