<h1 align="center"> Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models </h1> 

We introduce ScoreRS, a quality assessment model trained on carefully curated large-scale remote sensing vision-language preference data. ScoreRS effectively scores and filters vision-language datasets, improving model performance by selecting high-quality data for training.

## Table of Contents
- [Environment Setup](#environment-setup)
  - [Basic Environment](#basic-environment)
  - [VLLM Environment](#vllm-environment)
  - [RL Environment](#rl-environment)
- [Training](#training)
  - [Score Model Training](#score-model-training)
  - [CLIP Training](#clip-training)
  - [Large VLMs Training](#large-vlm-training)
  - [RL Training](#rl-training)
- [Evaluation](#evaluation)
  - [CLIP Evaluation](#clip-evaluation)
  - [LVLM Evaluation](#lvlm-evaluation)
  - [BoN](#bon-evaluation)


## Environment Setup

### Basic Environment

This environment used for score model training, inference, demonstrations, fine-tuning CLIP and Qwen2VL, without other specifics; you should use this environment.

~~~shell
conda create -n scorers python==3.12 -y
conda activate scorers

cd scorers  # important!!!!!!! Make sure you are under the projcet directory for the following command.
bash basic_env_setup.sh
~~~

### VLLM Environment
Use this environment for BoN prediction.
~~~shell
conda create -n scorers_vllm python==3.12 -y
conda activate scorers_vllm

cd scorers  # important!!!!!!! Make sure you are under the projcet directory for the following command.
bash vllm_env_setup.sh
~~~

### RL Environment
Use this for RL training.
~~~shell
conda create -n verl python==3.12 -y
conda activate verl

cd scorers/customVeRL
pip install torch
pip install -e .
pip install pandas fastparquet
~~~

## Training

### Score Model Training

+ Please follow the instructions following these shell scripts ([Stage1](./script/train_reward_first_stage.sh), [Stage2](./script/train_reward_second_stage.sh), [Stage3](./script/train_reward_third_stage.sh)) for the first, second, and third stage training.

### CLIP Training

+ Please follow the [shell script](./script/train_clip_remoteclip.sh)

### Large VLM Training

+ Please follow the two shell script for fine-tuning large VLMs: [pretrain](./script/train_pretrain.sh), [sft](./script/train_sft.sh)

### RL Training

+ First, launch the score model server:

    ~~~shell
    cd customVeRL/exampels/reward_function
    
    uvicorn reward_server:app --host 0.0.0.0 --port 8000
    ~~~

+ Then, follow the instructions of this shell script: [RL Training](./customVeRL/examples/train_grpo.sh)

## Evaluation

### CLIP Evaluation

+ Classification:

    Please refer to [this file](./python_script/evaluation/clip_classification.py) for evluation CLIP on classification tasks.

+ Retrieval

    Please refer to [this file](./python_script/evaluation/eval_retrieval.py) for evluation CLIP on retrieval tasks.

### LVLM Evaluation

+ First download and unzip our published Eval dataset after the anonymous time.

+ For Evaluation Our Qwen2VL-RS Series (Shell Script)

    + Qwen2VL-RS

        ~~~shell
        SCRIPT_PATH=./python_script/evaluation/rs_evaluation.py
        DATA_ROOT="Your path to unzip folder"
        OUTPUT_DIR="Your path to eval log file"
        model_type=lmdeploy
        MODEL_PATH= # path to Qwen2VL-RS
        
        CUDA_VISIBLE_DEVICES=0 accelerate launch --num_processes 1 --mixed_precision bf16 $SCRIPT_PATH \
            --data_root $DATA_ROOT \
            --output_dir $OUTPUT_DIR \
            --model_type $model_type \
            --model_path $MODEL_PATH \
            --force_inference true \
            --task all
        ~~~

    + Qwen2VL-RS-R1

        ~~~shell
        ...  # same as above
        MODEL_PATH= # path to Qwen2VL-RS-R1
        REASONING_CONFIG=./python_script/evaluation/qwen2_thinking_template.json
        
        CUDA_VISIBLE_DEVICES=0 accelerate launch --num_processes 1 --mixed_precision bf16 $SCRIPT_PATH \
            --data_root $DATA_ROOT \
            --output_dir $OUTPUT_DIR \
            --model_type $model_type \
            --model_path $MODEL_PATH \
            --force_inference true \
            --task all \
            --reasoning_config $REASONING_CONFIG
        ~~~

+ Qwen2-VL, LLaVA-1.6, and InternVL-2.5

    ~~~shell
    ...  # same as eval on Qwen2VL-RS
    model_type=lmpdeloy
    MODEL_PATH=Qwen/Qwen2-VL-7B-Instruct  # liuhaotian/llava-v1.6-vicuna-7b or OpenGVLab/InternVL2_5-8B
    ... # same as eval on Qwen2VL-RS
    ~~~

+ GeoChat, VHM, and SkysenseGPT

    ~~~shell
    ...  # same as eval on Qwen2VL-RS
    model_type=geochat  # vhm or skysensegpt
    MODEL_PATH=MBZUAI/geochat-7B  # FitzPC/vhm_7B or ll-13/SkySenseGPT-7B-CLIP-ViT
    ... # same as eval on Qwen2VL-RS
    ~~~

+ LHRS-Bot-Nova

    + First, download the  converted Huggingface-style checkpoint from [here](https://huggingface.co/LHRS/LHRS-Bot-Nova/tree/main/Stage3_HF)

        ~~~shell
        ...  # same as eval on Qwen2VL-RS
        model_type=lhrs
        MODEL_PATH="your_path_to FINAL.pt"  # important!!! must be point to FINAL.pt file and make sure the TextLoRA is under the same folder with the FINAL.pt
        ... # same as eval on Qwen2VL-RS
        ~~~

### BoN Evaluation

+ Make sure your are under the vllm environment settings.
    + For BoN evaluation on LHRS-Bench, please refer to [this file](./python_script/evaluation/lhrs_bench_bon.py).
    + For BoN evaluation on VG-DIOR, please refer to [this file](./python_script/evaluation/vg_bon.py).