# CPPO Training

This repository contains the implementation of **Contrastive Perception Policy Optimization (CPPO)** based on verl framework.

## Dataset

To train CPPO, you will first need to download the **ViRL39K** dataset.  
Please ensure it is placed in the appropriate directory before preprocessing.

## Preparing the Dataset

After downloading, preprocess the dataset using the following. Set the correct data path in the following code.

```
python prepare_dataset.py
```

## Install verl

Follow offical instruction of verl to set up the training framework.

## Prepare backbone model

Download the backbone 3B and 7B Qwen2.5-VL-Instruct-3B and 7B from Huggingface.

## Train 3B model

Set the appropriate variables in the following bash file and run:

```
bash recipe/cppo/train_3B.sh
```

## Train 7B model

Set the appropriate variables in the following bash file and run:

```
bash recipe/cppo/train_7B.sh
```

