# Important

The version in this Supplemental does not have the necessary IsaacGym asset files due to upload size limits to 100M， we delelte most IsaacGym asset in this version,  so this version is only for showcasing code and reward functions, not for running.


If you want to run the program, please download the zip from this anonymous Google Drive link to obtain runnable code:

[https://drive.google.com/drive/folders/1KiNJKgsIsVj_2YTxuJwpVD20AtNTgFg8?usp=sharing](https://drive.google.com/drive/folders/1KiNJKgsIsVj_2YTxuJwpVD20AtNTgFg8?usp=sharing)


The main content in supp version includes: 

* The prompt for the experiment is placed under the folder "RF_Agent/utils/prompts_rfagent"

* The reward functions generated with different baselines are placed under the folder "RF_Agent/reward_functions". The reward function is at the bottom of each python file with function name "compute_reward".

* The core method of RF-Agent is places under the folder "RF_Agent/rf_agent_algo"


If you want to run the program, please download the zip from the above anonymous Google Drive link to obtain runnable code.


# Installation

RF-Agent requires Python ≥ 3.8. We have tested on Ubuntu 20.04 and 22.04.


1. Create a new conda environment with:

```python
conda create -n rfagent python=3.8
conda activate rfagent
```


1. Install IsaacGym (tested with `Preview Release 4/4`). Follow the [instruction](https://developer.nvidia.com/isaac-gym) to download the package.

```python
tar -xvf IsaacGym_Preview_4_Package.tar.gz
cd isaacgym/python
pip install -e .

(test installation success of IsaacGym) 
python examples/joint_monkey.py
```


1. Install RF-Agent

First download the zip from Google Drive link as shwon in "Important" step. (Because of the need for automatic replacement of the reward function, we have made minimal modifications to issacgym and bidex as Eureka done, so it is recommended to use the package directly from the modified, rather than redeploying environment methods such as Bidex and rlgames)

Second, unzip the file.

Third:

```python
cd RFAgent; pip install -e .
cd isaacgymenvs; pip install -e .
cd ../rl_games; pip install -e .
```


1. Set Openai API Key


You need an OpenAI API key to use RF-Agent. Set the environment variable in your terminal

```python
export OPENAI_API_KEY = "YOUR_API_KEY"
```



# Running RF-Agent

If you want to use RFAgent to generate reward functions and optimize them, run:

```python
python rfagent.py env=ant model=gpt-4o-mini simulations=80 max_iterations=500 test_max_iterations=500
```
* env is the task to perform.  Options are listed in "RF_Agent/cfg/env"

* model is the LLM backbone chosen. We recommend gpt-4o-mini for its fast response and efficient cost.

* simulations is the total search counts. Default value for IssacGym is 80, for Bi-DexHands is 512.

* max_iterations is the update itration times per training during the search stage, and test_max_iterations is the update itration times per training during the final evaluation stage. We set the test_max_iterations as same as the benchmark default value, and max_iterations is often the half of test_max_iterations for increasing search speed. Please note that the benchmark default value for iterations is different from each task, please refer to "different_environment_itration_nums.txt"


# Evaluating different reward functions

We provide the best reward functions generated by differernt methods (RF-Agent/Revolve/Eureka).

You can directly evaluate these reward functions by running:

```python
python test.py env=ant test_reward_function="/reward_functions/isaac/RFAgent/Ant-4o-mini.py"  test_max_iterations=500
```
* env is the task to perform.  Options are listed in "RF_Agent/cfg/env"

* test_reward_function is the reward function chosen to evaluate

* test_max_iterations is the update itration times per training during the final evaluation stage. We set the test_max_iterations as same as the benchmark default value. Please note that the benchmark default value for iterations is different from each task, please refer to "different_environment_itration_nums.txt"

* Please attention to match the "test_reward_function" with the "env" at the same task.



# Other Mentions

1. Because each seed will correspond to a parallel sampling and scrambling of thousands of environments, it will take up a lot of memory. You can set num_eval to 1 to test only at seed = 0, e.g. :

```python
python test.py env=ant test_reward_function="/reward_functions/isaac/RFAgent/Ant-4o-mini.py"  test_max_iterations=500 num_eval=1
```


# Thanks

The code for RF-Agent is based on Eureka, thanks to their work

[https://github.com/eureka-research/Eureka](https://github.com/eureka-research/Eureka)

