<h1 style="text-align:center; font-family: charter;">
    <img src="figures/icon.png" width="5%" style="vertical-align: -10px;"/> SpatialLadder:  Progressive Training for Spatial Reasoning in Vision-Language Models
</h1>

## 🌟 Methods

![Dataset Construction](figures/dataset.png)
We introduce SpatialLadder-26k, comprising 26,610 samples across four complementary task categories (object localization, single-image spatial reasoning, multi-view spatial reasoning, and video spatial reasoning) that form a complete spatial learning curriculum.

<img src="figures/framework.png" width="98.5%">
Building upon SpatialLadder-26k, we design a training framework that systematically constructs spatial intelligence through three progressive stages. The framework embodies the principle that robust spatial reasoning emerges from the integration of perception, understanding, and reasoning, with each stage building upon foundations established in previous stages.


## 🎉 Performance

![In-domain Results](figures/in_domain_performance.png)
![Out-of-domain Results](figures/out_of_domain_performance.png)

## ⚙️ Setup

```bash
conda create -n spatial-mllm python=3.10 -y
conda activate spatial-mllm
bash setup.sh
```

## 🚀 Training

### Data Requirements

The training process requires image files from [ScanNet](http://www.scan-net.org/). Please ensure you have downloaded and prepared the ScanNet dataset, then organize the required images by scene and image ID into the `VLM-R1/data/images` folder before proceeding with training.

### Quick Start

To train the model through all stages automatically:

```bash
cd VLM-R1/run_scripts
bash run_spld_all.sh
```

This will sequentially execute **Stage 1**, **Stage 2**, and **Stage 3** training processes. Each stage must complete successfully before the next one begins.

### Manual Training

For manual control or debugging purposes, you can run each training stage individually:

```bash
cd VLM-R1/run_scripts

# Stage 1
bash run_spld_stage1.sh

# Stage 2  
bash run_spld_stage1_2.sh

# Cold Start
bash run_spld_stage1_2_cs.sh

# Stage 3
bash run_spld_stage1_2_cs_stage3.sh
```

> **Note**: Make sure to run stages in the correct order, as each stage depends on the outputs from previous stages.

## 📊 Evaluation

### Quick Start

To evaluate the trained model:

```bash
cd VLM-R1/eval_spld
bash run_eval.sh
```

This will run the evaluation pipeline using the default configuration.

### Configuration

To modify evaluation settings, edit the `run_eval.sh` script directly:

```bash
MODEL_NAMES=("qwenvl_3b")
TASK=("VSI-Bench")
SUPPORTED_TASKS=("VSI-Bench" "SPBench-SI" "SPBench-MI" "SPAR-Bench" "ViewSpatial-Bench" "CV-Bench")
...
```

> **Note**: Ensure your model checkpoint path is correct and the evaluation data is properly prepared before running the evaluation script.