# Harnessing Bayesian Optimism with Dual Policies in Reinforcement Learning
This repository contains the official implementation of ICLR 2026 Submssion.
 
 
## Install
We use virtualenv to manage envrionments usually comes with conda, but can also be install as `pip install virutalenv`.    
We also provide environment variable control script for faster start.    
  
 - `virtualenv venv`
 - `source setup_dmcontrol.sh` (edit might be required, according to your system)
 - `pip install -r requirements.txt`


## Experiments tracking
We use wandb to track our experiments.    
If you want to disable wandb tracking, please add `no_track=True` at the end of each command. 


## Usage
Applicable tasks are listed in cfgs/task/, and applicable algorithms are listed in cfgs/algo/.     
You should find results in logs/ if you disabled tracking with wandb.    
Below we show example scripts, alternating tasks, and disable wandb tracking for convenience.   


- TD3
 ```bash
python main.py task=dmc_quadrupedrun algo=td3 no_track=True no_disc_logging=False
```

- TD3+BOLD (n=2, k=2)
```bash
python main.py task=dmc_cheetahrun algo=td3_boxd2 no_track=True no_disc_logging=False
```

 - TD3+BOLD (n=2, k=10)
```bash
python main.py task=dmc_hopperhop algo=td3_boxd2 algo.agent.critic_k_samples=10 no_track=True no_disc_logging=False
```


- SAC
```bash
python main.py task=dmc_humanoidstand algo=sac no_track=True no_disc_logging=False
```

- SAC+BOLD (n=2, k=10)
```bash
python main.py task=dmc_walkerrun algo=sac_boxd2 algo.agent.critic_k_samples=10 no_track=True no_disc_logging=False
```

- SAC+BOLD (n=10, k=10)
```bash
python main.py task=dmc_fingerturnhard algo=sac_boxd10 algo.agent.critic_k_samples=10 no_track=True no_disc_logging=False
```
