# Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
This is the source code for our paper submitted to ICLR 2026.
Here we provide the demo trained on LIBERO dataset.

## Training
**Before start training, you should do:**

- (1) Ddownload the [LIBERO datasets](https://huggingface.co/datasets/openvla/modified_libero_rlds) to `./data/libero/`.

- (2) Download the [OpenVLA checkpoints](https://huggingface.co/openvla/openvla-7b/tree/main) pretrained checkpoints to `./ckpts/`.

- (3) Download the [VGGT checkpoints](https://huggingface.co/facebook/VGGT-1B/blob/main/model.pt) pretrained checkpoints to `./ckpts/`.

**Then you can start the training:**
```
bash train.sh
```

## Evaluation
**Firstly, merge the LoRA weight:**
```
python vla-scripts/merge_lora_weights_and_save.py \
    --base_checkpoint  ckpts/openvla-7b \
    --lora_finetuned_checkpoint_dir ckpts/training_results/YOUR_RUN_ID
```

**Then you can start the evaluation:**
```
python experiments/robot/libero/run_libero_eval.py \
    --pretrained_checkpoint ckpts/training_results/YOUR_RUN_ID \
    --task_suite_name libero_spatial
```
