# Revised codebase: typo in original submission has been fixed

Reviewer tKs2 rightly noticed a typo in the submitted code.  The original code used to generate results for the paper did not have this typo, which was introduced when we cleaned up / refactored the code for the ICLR submission. We've fixed it in this repo (it was a one line change). We also ran the new codebase to confirm that it reproduces the results reported in the paper. **The paper reports a mean normalized score of 0.78** using 10 seeds across 12 evaluation tasks (120 total runs). We re-ran this codebase - pre-training with CIC (for 2M steps) and finetuning (for 100K) steps for 3 seeds per task. Finetuning results from this codebase achieved **a mean normalized score of 0.82** using 3 seeds (seeds 1,2,3) across 12 evaluation tasks. This shows that this codebase reproduces the results reported in the paper and uses the correct sampling strategy.

We show scores for all runs on this [Anonymous Website](https://sites.google.com/view/iclrcic/home).

## The typo 

In line 48-49 of `agent/cic.py` the queries and keys were accidentally swapped. We changed this back to the correct form:

```
# queries are skills (fixed in contrastive loss)
query = self.skill_net(skill)

# keys are transitions (used to sample negatives in contrastive loss)
key = self.pred_net(torch.cat([state,next_state],1))
```

# Contrastive Intrinsic Control (CIC)

This codebase is built on top of the [Unsupervised Reinforcement Learning Benchmark (URLB) codebase](https://anonymous.4open.science/r/urlb). We include agents for all baselines in the `agents` folder. Our method `CIC`  is implemented in `agents/cic.py` and the config is specified in `agents/cic.yaml`.

To pre-train CIC, run the following command:

```sh
python pretrain.py agent=cic domain=walker experiment=YOUR_EXP_NAME
```

To finetune CIC, run the following command. Make sure to specify the directory of your saved snapshots with `YOUR_EXP_NAME`.

```sh
python finetune.py pretrained_agent=cic agent=cic experiment=YOUR_EXP_NAME task=walker_stand snapshot_ts=2000000 agent=cic
```

## Requirements
We assume you have access to a GPU that can run CUDA 10.2 and CUDNN 8. Then, the simplest way to install all required dependencies is to create an anaconda environment by running
```sh
conda env create -f conda_env.yml
```
After the instalation ends you can activate your environment with
```sh
conda activate urlb
```

## Available Domains
We support the following domains.
| Domain | Tasks |
|---|---|
| `walker` | `stand`, `walk`, `run`, `flip` |
| `quadruped` | `walk`, `run`, `stand`, `jump` |
| `jaco` | `reach_top_left`, `reach_top_right`, `reach_bottom_left`, `reach_bottom_right` |


### Monitoring
Logs are stored in the `exp_local` folder. To launch tensorboard run:
```sh
tensorboard --logdir exp_local
```
The console output is also available in a form:
```
| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42
```
a training entry decodes as
```
F  : total number of environment frames
S  : total number of agent steps
E  : total number of episodes
R  : episode return
FPS: training throughput (frames per second)
T  : total training time
```
