# **Urban Spatiotemporal Reasoning Benchmark**

This repository provides a comprehensive framework for **urban spatiotemporal reasoning** tasks. It integrates **large language models (LLMs)** into classic urban science problems such as:

It includes **spatial-temporal reasoning QA** tasks to evaluate LLMs' ability in understanding and reasoning over spatial-temporal data, and **downstream tasks** to evaluate LLMs' ability in solving real-world urban tasks.

---

## 🔧 Installation (Python 3.10)

```bash
# Step 1: Install PyTorch and related packages
pip install torch==2.5.1 torchaudio==2.5.1 torchvision==0.20.1

# Step 2: Install other dependencies
pip install -r requirements.txt

# Step 3: Install CityFlow for traffic simulation
conda install -c conda-forge cmake
pip install ./CityFlow
```

## Dataset Download
Download the datasets from [Huggingface](https://huggingface.co/datasets/USTBench/USTBench) then copy them to the `UST_tasks` folder:
```bash
cp -r /path/to/dataset/question_answering/Data /path/to/USTBench/UST_tasks/question_answering/
cp -r /path/to/dataset/next_poi_prediction/Data /path/to/USTBench/UST_tasks/next_poi_prediction/
```

---

## 🚀 Running a Task

### General Format

```bash
python run_UST_tasks.py --task <task_name> \
                        --batch_size <int> \
                        --llm_path_or_name <llm_model_path_or_hub_name> \
                        [--use_reflection true(default)/false ] \
                        [--other_task_specific_args]
```

### View Required Arguments

Each task has its own required parameters. You can inspect them by running:

```bash
python run_UST_tasks.py --task <task_name> --help
```

---

## 🧠 Task Examples

### 🎯 Process-based Reasoning Ability Evaluation QA

Example 1: QA constructed by prediction tasks

```bash
python run_UST_tasks.py --task question_answering \
                        --batch_size 32 \
                        --llm_path_or_name deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
                        --tasks "next_poi_prediction, congestion_prediction, socio_economic_prediction, traffic_od_prediction" \
                        --datasets "st_understanding, forecasting, reflection"
```

Example 2: QA constructed by decision-making tasks

```bash
python run_UST_tasks.py --task question_answering \
                        --batch_size 32 \
                        --llm_path_or_name deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
                        --tasks "traffic_signal_control, poi_placement, road_planning, route_planning, urban_planning" \
                        --datasets "st_understanding, planning, reflection"
```

Or just run from our script for all QA tasks:

```bash
bash ./scripts/run_spatiotemporal_reasoning_evaluation.sh
```

---

### 🔍 Downstream Task Examples

Each downstream task has specific required arguments:

#### Socio-Economic Prediction

```bash
python run_UST_tasks.py --task socio_ecomic_prediction \
                        --batch_size 32 \
                        --llm_path_or_name deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
                        --location "Guangzhou" \
                        --use_reflection true
```

#### Congestion Prediction

```bash
python run_UST_tasks.py --task congestion_prediction \
                        --batch_size 32 \
                        --llm_path_or_name deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
                        --location "Beijing" \
                        --use_reflection true
```

#### Road Planning

```bash
python run_UST_tasks.py --task road_planning \
                        --batch_size 32 \
                        --llm_path_or_name deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
                        --slum_name "CapeTown1" \
                        --use_reflection true
```

#### Urban Planning

```bash
python run_UST_tasks.py --task urban_planning \
                        --batch_size 32 \
                        --llm_path_or_name deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
                        --cfg "hlg" \
                        --use_reflection true
```

#### POI Placement

```bash
python run_UST_tasks.py --task poi_placement \
                        --batch_size 32 \
                        --llm_path_or_name deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
                        --location "Qiaonan" \
                        --use_reflection true
```

#### Traffic Signal Control

```bash
python run_UST_tasks.py --task traffic_signal_control \
                        --batch_size 32 \
                        --llm_path_or_name deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
                        --dataset "hangzhou" \
                        --traffic_file "anon_4_4_hangzhou_real.json" \
                        --use_reflection true
```

#### Traffic Flow Prediction

```bash
python run_UST_tasks.py --task traffic_od_prediction \
                        --batch_size 32 \
                        --llm_path_or_name deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
                        --location "Newyork" \
                        --use_reflection true
```

#### Route Planning

```bash
python run_UST_tasks.py --task route_planning \
                        --batch_size 32 \
                        --llm_path_or_name deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
                        --location "Manhattan" \
                        --use_reflection true
```

#### Human Mobility Prediction

```bash
python run_UST_tasks.py --task next_poi_prediction \
                        --batch_size 32 \
                        --llm_path_or_name deepseek-ai/DeepSeek-R1-Distill-Qwen-7B \
                        --location "Newyork" \
                        --use_reflection true
```

Or just run from our script for all downstream tasks:

```bash
bash ./scripts/run_downstream_tasks.sh
```

---

## 📌 Supported Tasks and Arguments

| Task Name                 | Required Arguments        |
| ------------------------- | ------------------------- |
| `question_answering`      | `tasks`, `datasets`       |
| `next_poi_prediction`     | `location`                |
| `poi_placement`           | `location`                |
| `congestion_prediction`   | `location`                |
| `route_planning`          | `location`                |
| `socio_ecomic_prediction` | `location`                |
| `traffic_signal_control`  | `dataset`, `traffic_file` |
| `traffic_od_prediction`   | `location`                |
| `road_planning`           | `slum_name`               |
| `urban_planning`          | `cfg`                     |

---

## 📎 Notes

* `--use_reflection`: Enables reflective reasoning if set to `true`.
* All models can be replaced with any HuggingFace-compatible LLM or local path to a fine-tuned model.
* You can easily extend new tasks by modifying `UST_tasks/` and updating `TASK_CONFIG` in `utils/task_config.py`.
