# Progress Reward Model for Reinforcement Learning



## Introduction

This is the supplementary material of NeurIPS 2025 Submission #12659 "Progress Reward Model for Reinforcement Learning via Large Language Models". 

The main content in this supplementary material includes:
1. Reproduce the evaluation results as shown in the `Experiment` part from the paper.
2. Regenrate the LLM response.

## Notice

Due to the different version requirements of `gymnasium` for the latest versions of `MetaWorld` and `ManiSkill`
(`gymnasium==1.1.1` for Metaworld and `gymnasium==0.29.1` for Maniskill),
we recommend using two separate environments.


## Installation

### For Metaworld
Create virtual environment. 
```sh
conda create -n prmmw python=3.9
conda activate prmmw
```

In this work, we use the `pytorch==2.5.1` with `cuda==12.1`.
```sh
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
```

To use the [SimCSE sentence encoder](https://huggingface.co/princeton-nlp/unsup-simcse-bert-base-uncased) from Hugging Face, install `transformers`.
```sh
pip install transformers
```

We use the [StableBaselines3](https://github.com/DLR-RM/stable-baselines3/) as the implementation framework. 
```sh
pip install 'stable-baselines3[extra]'
```
Please refer to [StableBaselines3 Documentation](https://stable-baselines3.readthedocs.io/en/master/guide/install.html#) to check the installation.

Then, install the latest version of [MetaWorld](https://github.com/Farama-Foundation/Metaworld).
```sh
pip install git+https://github.com/Farama-Foundation/Metaworld.git@master#egg=metaworld
```
Please refer to [Meta-World Documentation](https://github.com/Farama-Foundation/Metaworld) to check the installation.

Finally, install other dependencies.
```sh
pip install -r requirements-mw.txt
```

To reimplement our experiment results on Metaworld, you can run the following scripts:
```shell
bash run_mw.sh
```

We use `tensorboard` to conveniently check the evaluation results.
```sh
tensorboard --logdir ./logs/training
```

### For Maniskill
Create virtual environment. 
```sh
conda create -n prmms python=3.9
conda activate prmms
```

In this work, we use the `pytorch==2.5.1` with `cuda==12.1`.
```sh
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
```

To use the [SimCSE sentence encoder](https://huggingface.co/princeton-nlp/unsup-simcse-bert-base-uncased) from Hugging Face, install `transformers`.
```sh
pip install transformers
```

We use the [StableBaselines3](https://github.com/DLR-RM/stable-baselines3/) as the implementation framework. 
```sh
pip install 'stable-baselines3[extra]'
```
Please refer to [StableBaselines3 Documentation](https://stable-baselines3.readthedocs.io/en/master/guide/install.html#) to check the installation.

Then, install the latest version of [Maniskill](https://github.com/haosulab/ManiSkill).
```sh
pip install --upgrade mani_skill
```
Please refer to [Masniskill Documentation](https://maniskill.readthedocs.io/en/latest/user_guide/getting_started/installation.html) to check the installation.

Finally, install other dependencies.
```sh
pip install -r requirements-mw.txt
```

To reimplement our experiment results on Maniskill, you can run the following scripts:
```shell
bash run_ms.sh
```

We use `tensorboard` to conveniently check the evaluation results.
```sh
tensorboard --logdir ./logs/training
```

To generate LLM response: 
```shell
cd code_generation
bash run_maniskill.sh
bash run_metaworld.sh
```

## Acknowledgement
We gratefully acknowledge the use of the code implementation from [text2reward](https://github.com/xlang-ai/text2reward), which further builds upon [StableBaselines3](https://github.com/DLR-RM/stable-baselines3/).
Both works are duly cited in the paper.