# DRIVE on D4RL Benchmarks

This directory contains the implementation of the **DRIVE Framework** tested on D4RL benchmarks (e.g., MuJoCo tasks). This implementation is built upon the [Decision Transformer](https://github.com/kzl/decision-transformer) codebase and extends it with retrieval-augmented mechanisms and value-based critics.

## Installation

Experiments require **MuJoCo**.
Follow the instructions in the [mujoco-py repo](https://github.com/openai/mujoco-py) to install.

Then, install the dependencies:

```bash
conda env create -f conda_env.yml
conda activate decision-transformer

```

### External Dependencies

This project uses external codebases for the Critic implementations. Please setup them manually:

1. **CQL (rlkit)**:
* Clone [https://github.com/aviralkumar2907/CQL](https://github.com/aviralkumar2907/CQL).
* Copy the `rlkit` folder into the root of this `D4RL/` directory.
The `rlkit` is directly copy from the https://github.com/aviralkumar2907/CQL, and the rlkit/torch/sac/cql.py is highly based onthe origin code.


2. **IQL**:
* The IQL implementation is adapted from [https://github.com/gwthomas/IQL-PyTorch](https://github.com/gwthomas/IQL-PyTorch). (Relevant files are included, but check requirements if needed).



##  Downloading Datasets

Datasets are stored in the `data` directory.

1. Install the [D4RL repo](https://github.com/rail-berkeley/d4rl) following their instructions.
2. Run the following script to download datasets and save them in the format required by Decision Transformer (`.pkl`):

```bash
python download_d4rl_datasets.py

```

##  Usage Pipeline

The training and evaluation process consists of **4 steps**. Please execute them in the following order.

### Step 1: Policy Training (DT)

Train the Decision Transformer to serve as the policy  and the encoder.

```bash
# Example: hopper-medium-v2
python experiment.py --env hopper --dataset medium --model_type dt

```

*Adding `-w True` will log results to Weights and Biases.*

### Step 2: Index Construction

Generate state embeddings and build the retrieval index using the trained DT model.

```bash
python get_code.py --env hopper-medium-v2

```

### Step 3: Critic Training

Train the value function (Critic) to evaluate candidate actions. Choose **one** of the following:

**Option A: Train IQL Critic**

```bash
python train_iql.py --env hopper-medium-v2

```

**Option B: Train CQL Critic**

```bash
python train_cql.py --env hopper-medium-v2

```

### Step 4: Evaluation

Run the retrieval-augmented evaluation. Select the script corresponding to the critic you trained.

**Evaluate with IQL Critic:**

```bash
python eval_retrieval_iql.py --env hopper-medium-v2

```

**Evaluate with CQL Critic:**

```bash
python eval_retrieval_cql.py --env hopper-medium-v2

```

---

## Directory Structure

```text
D4RL/
├── data/                    # Dataset storage (.pkl files)
├── decision_transformer/    # DT implementation
├── rlkit/                   # (External) RL toolkit from CQL repo
├── src/                     # Source code for policies and values
├── download_d4rl_datasets.py # Script to download and format data
├── experiment.py            # Phase 1: Train DT Policy
├── get_code.py              # Phase 2: Index Construction
├── train_cql.py             # Phase 3: Train CQL Critic
├── train_iql.py             # Phase 3: Train IQL Critic
├── eval_retrieval_cql.py    # Phase 4: Evaluation
└── eval_retrieval_iql.py    # Phase 4: Evaluation

```