# rl-ccoa

## Getting started

Please install [Jax](https://jax.readthedocs.io/en/latest/installation.html), as well as the packages in requirements.txt.

## Experiments

We provide the steps to reproduce most of the results in our main paper.

### Linear key-to-door, performance, learnt-models

```
wandb sweep sweeps/performance_asymptotic_learnt.yaml
```
The relevant figures can be generated using the following scripts:
```
python3 figures/performance_asymptotic_length_learned.py <WANDB_SWEEP_ID>
python3 figures/performance_time_envlen103_learned.py <WANDB_SWEEP_ID>
```

### Linear key-to-door, performance, groundtruth-models
```
wandb sweep sweeps/performance_asymptotic_gt.yaml 
```
The relevant figures can be generated using the following scripts:

```
python3 figures/performance_time_envlen103_gt.py <WANDB_SWEEP_ID>
```
### Linear key-to-door, shadow training
```
wandb sweep sweeps/shadow_asymptotic_learnt.yaml
```
The relevant figures can be generated using the following scripts:

```
python3 figure/bias-variance-snr_asymptotic_length_learned.py <WANDB_SWEEP_ID>
python3 figures/bias-variance_aggregate_env103_learned.py <WANDB_SWEEP_ID>
python3 figures/snr_aggregate_env103_learned.py <WANDB_SWEEP_ID>
```
### Reward switching
```
wandb sweep sweeps/reward_switch.yaml
```
The relevant figures can be generated using the following scripts:

```
python3 figures/reward-switch_performance_time_learned.py <WANDB_SWEEP_ID>
```
### Reward aliasing
```
wandb sweep sweeps/aliasing_exp.yaml
```
The relevant figures can be generated using the following scripts:

```
python3 figures/performance_time_envlen103_reward-aliasing.py <WANDB_SWEEP_ID>
```
