# ReGUIDE

Official PyTorch implementation of "ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search".

The implementation is mainly forked from verl (https://github.com/volcengine/verl) framework [1].

## Train and Evaluation Instructions

0. **Setup Environment**
   ```bash
   pip install -e .
   pip install vllm==0.8.5
   ```
1. **Prepare Dataset**
   
   1-1. Download Dataset from 'osunlp/UGround-V1-Data' and subsample the dataset[2].
   ```bash
   python src/select.py --output_dir ./datasets/uground_21k
   ``` 

   1-2. Convert to parquet for training.
   ```bash
   python src/prepare_uground.py --local_dir ./data/uground
   ```   

2. **Train**

   2-1. Learning to explain GUI images via reasoning
   ```bash
   bash scripts/train_rl.sh
   ```
   2-2. Prepare dataset for consistent training.
   ```bash
   bash scripts/prepare_consistent.sh 
   ```
   2-3. Learning to predict consistent coordinates under transformations.
   ```bash
   bash scripts/train_consistent.sh
   ```

3. **Evaluate**

   Run `eval.sh` script to evaluate the model. To evaluate individually, refer to `src/evaluate_baseline.py` and `src/evaluate_test_time_scaling.py`.

   ```bash
   bash scripts/eval.sh

   # python evaluate_baseline.py --model_dir $M --dataset $D
   # python evaluate_test_time_scaling.py --model_dir $M --dataset $D
   ```

## Reference

[1] Guangming Sheng et al., HybridFlow: A Flexible and Efficient RLHF Framework, 2024

[2] Boyu Gou et al., Navigating the Digital World as Humans Do: Universal Visual Grounding for {GUI} Agents, 2025
