# Anonymous Code for *Quality and Diversity Optimization Even from Offline Homogeneous Dataset*

This repository contains the source code for reproducing the experiments in our ICLR submission.

---

## Environment Setup

We recommend using **Python 3.10.0**.  
Install dependencies with:

```bash
pip install -r requirements/requirements.txt
```

---

## Running the Code

Each environment–task pair uses a specific YAML config file located in:

```
configs/offline/edac/<env_name>/<task_name>.yaml
```

For example:

```
configs/offline/edac/halfcheetah/medium_expert_v2.yaml
```

To train a policy using our method:

```bash
nohup python3 algorithms/offline/UniqueBehavior.py \
  --config_path configs/offline/edac/halfcheetah/medium_v2.yaml \
  --eta 1.0 \
  --num_critics 30 \
  --train_seed 12 &
```

You may either modify the YAML config file directly or override parameters via command-line arguments.

---

## Logging with Weights & Biases (wandb)

This codebase uses [Weights & Biases](https://wandb.ai/) for experiment tracking.

---

## File Structure

- `algorithms/offline/edac/UniqueBehavior.py` — Main training script implementing our method.  
- `configs/` — YAML configs for different D4RL tasks.  
- `requirements/requirements.txt` — Dependency list.    

---

## Notes

- Experiments use **5 random seeds** and **10 evaluation episodes per seed**, as described in the paper.  
- Performance is normalized following *Fu et al., 2020*.  
- Diversity is computed following *Osa and Harada, 2024*.  

---

## References

- Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep
data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020 
- Takayuki Osa and Tatsuya Harada. Discovering multiple solutions from a single task in offline
reinforcement learning. International Conference on Machine Learning, 2024. 
