# VLM-R$^3$: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought

## Overview

- **VLIR Dataset**: [vlir_11k.json](vlir_11k.json)
- **Supervised Fine-Tuning Code**: [qwen_vl_finetune/](qwen_vl_finetune/)
- **R-GRPO Code**: [src/](src/)

This project is built upon [R1-V](https://github.com/Deep-Agent/R1-V) and [VLM-R1](https://github.com/om-ai-lab/VLM-R1).