# MM-PRM

## 📊 MM-K12 Dataset

https://huggingface.co/datasets/MM-PRM/MM-K12

## 🏁 Getting Started

### 📦 Installation

```shell
cd MM-PRM
pip install -r requirements.txt

# install flash-attn==2.3.6:

pip install flash-attn==2.3.6 --no-build-isolation

# Alternatively you can compile from source:

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.3.6
python setup.py install
```

### 📂 Data Pipeline

1. **Seed dataset preparation**

   To begin, prepare a seed dataset consisting of verifiable problems. Each example should be formatted as a JSON object containing the following fields:

   ```json
   [
       {
           "id": "unique identifier for the problem",
           "question": "problem statement",
           "correct_answer": "ground-truth final answer for evaluation and verification",
           "image_path": "/path/to/image.png"
       },
       ...
   ]
   ```

   This dataset will be used as input to the data generation engine to generate annotated solution trees with step-wise correctness labels.

   To enable parallel data generation, you need to split the seed dataset into smaller chunks.

   ```shell
   cd data_pipeline
   python process_json.py
   ```

2. **API endpoint setup (Optional)**

   The data generation process requires an API endpoint to automatically verify whether the final answer in a rollout is correct. You can deploy a model (e.g., Qwen2.5) locally to act as the answer judge.

   We recommend using [vLLM](https://docs.vllm.ai/) to deploy a local model.

3. **Run data generation**

   Once you have all set, you can run the data generation pipeline to generate step-level supervision data.

   Before running, ensure that all necessary parameters are correctly set in the script or passed through the environment.

   ```shell
   sh run_data_pipeline.sh
   ```

4. **Sampling Training Data from data generation annotation trees**

   After generating annotated reasoning trees with the data generation pipeline, you need to sample step-by-step solution paths from these trees to construct the training data for the Process Reward Model (PRM). This can be done using the script:

   ```shell
   python traverse.py
   ```

   The next step is to convert this data into the format required for PRM training. Use the following script to perform the formatting:

   ```shell
   python prm_data_format.py
   ```

### 🌐 Start PRM Training

Create a JSON file in `internvl_chat/shell/data/`

The format for the JSON file should be:

```json
{
  "your-custom-prm_dataset": {
    "root": "/path/to/the/image/root",
    "annotation": "/path/to/the/jsonl/annotation",
    "data_augment": false,
    "repeat_time": 1,
    "length": "number of samples in the dataset"
  }
}
```

Once the dataset configuration is in place, you can start training the PRM model with:

```shell
GPUS=8 sh shell/internvl2.5/2nd_finetune/internvl2_5_8b_dynamic_res_2nd_finetune_full_prm.sh
```

### 📊 Evaluation

We provide our **evaluation code** in the `eval/` directory.

