# 🛠️Installation
```bash
# Create and activate a new conda environment
conda create -n pr1 python=3.10 -y  
conda activate pr1

# Clone the repository and install dependencies
cd PR1
pip install -e ".[dev]"
pip install flash-attn==2.7.0.post2 --no-build-isolation
```

# 🔄Training
Before training, modify the script to specify your model and data paths. Then run the experiment using:
```bash
bash local_scripts/train/train_qwen2_2b_vl_grounding.sh
```
The training script includes comprehensive configurations for hyperparameters, data loading, and model checkpointing. For custom training scenarios, you can adjust parameters such as learning rate, batch size, and optimization settings directly in the script.

# 📊Evaluation

```bash
eval/
├── images/
│   ├── coco/
│   ├── pixmo-count/
│   └── ocr/
└── jsons/
    ├── counting/
    ├── grounding/
    ├── ocr/
    └── detection/
```
## Running Evaluations
### Counting Evaluation

```bash
python eval/evaluate_counting.py \
    --model_path 'Kangheng/PR1-Qwen2-VL-2B-Counting' \
    --anno_dir 'eval/jsons/counting/' \
    --image_dir 'eval/images/'
```

### Grounding Evaluation
```bash
python eval/evaluate_grounding.py \
    --model_path 'Kangheng/PR1-Qwen2-VL-2B-Grounding' \
    --anno_dir 'eval/jsons/grounding/' \
    --image_dir 'eval/images/coco/'
```

### Detection Evaluation
```bash
pip install pycocotools
python eval/evaluate_detecion.py \
    --model_path Kangheng/PR1-Qwen2-VL-2B-Detection \
    --anno_dir 'eval/jsons/detection/coco_val2017.json' \
    --image_dir 'eval/images/coco/val2017/'
```
### OCR Evaluation
```bash
python eval/evaluate_ocr.py \
    --model_path Kangheng/PR1-Qwen2-VL-2B-OCR \
    --anno_dir 'eval/jsons/ocr/' \
    --image_dir 'eval/images/ocr/'
```

# 📈Results
## Grounding
![Evaluation of Grounding](assets/grounding_results.jpg)

## OCR
![Evaluation of OCR](assets/ocr_results.jpg)

## Counting and Detection
![Evaluation of Counting and Detection](assets/counting_detection_results.jpg)

# Some Cases 
![OCR Case](assets/ocr_case.jpg)
![Counting Case](assets/counting_case.jpg)
![Detection Case](assets/detection_case.jpg)
![Grounding Case](assets/grounding_case.jpg)

# Acknowledgement

This work builds upon several important open-source projects. We would like to acknowledge the following repositories that inspired our research:
- [R1-V](https://github.com/Deep-Agent/R1-V)
- [R1-Multimodal-Journey](https://github.com/FanqingM/R1-Multimodal-Journey/tree/main)
- [open_r1](https://github.com/huggingface/open-r1)
