# Reward-Design-for-Offline-RL

## Installation

``` bash
conda create -n env_name python=3.9
conda activate env_name
cd code
bash install.sh
```


## Prompts
1. For general prompts, please refer to the file `reward_design/prompts/common_prompt.py`.
2. For task-specific prompts, please refer to the file `reward_design/prompts/d4rl_prompt.py`.
3. For loss prompt template, please refer to the file `reward_design/prompts/preference_prompt.py`.

## Experiments

### Code Generation

1. To modify parameters, refer to `reward_design/code_generation.py`.
2. In `scripts/run_code.sh`, configure the required environment in `envs` and comment out any unnecessary environments.
3. Results are saved in the `./reward_logs` directory by default. Then run:

```bash
export OPENAI_API_KEY="your_api_key_here"
bash scripts/run_code.sh
```

### RL Training

By default, the script trains four reward functions in parallel for iteration number `T = {0, 1, 2, 3}`, which requires four GPUs.

#### GPU Selection

- `$num = 0`: Uses `CUDA_VISIBLE_DEVICES=0,1,2,3`
- `$num = 1`: Uses `CUDA_VISIBLE_DEVICES=4,5,6,7`

#### Reward Log Path

- `$reward_path` is the absolute path to the reward function log file.
- Example usage: `/.../env_name=halfcheetah-medium-expert-v2/YYYY-MM-DD-hh-mm-ss`

#### Run IQL on MuJoCo

```bash
bash scripts/run_mujoco_IQL.sh $num $reward_path
```

#### Run IQL on AntMaze

```bash
bash scripts/run_antmaze_IQL.sh $num $reward_path
```

#### Run IQL on Adroit

```bash
bash scripts/run_adroit_IQL.sh $num $reward_path
```

#### Run TD3+BC on MuJoCo

```bash
bash scripts/run_mujoco_TD3_BC.sh $num $reward_path
```

The default algorithm parameters are consistent with the original paper, except for `R_min` and `R_max`.

For different environments, please adjust `R_min` and `R_max` in `.sh` files through **argparse** to achieve the best RL training performance.

```bash
python algos/IQL/main_iql.py --R_min <value> --R_max <value>
```



