# SAP-SAM
 respository for SAM-SAP model

![Overview of SAP-SAM](fig/region_alignment_framework.pdf "Overview of SAP-SAM") 

# Step 01
1. In this step, we obtain the textual descriptions corresponding to the human-parsing dataset.  
First, download the ATR dataset [here](https://github.com/lemondan/HumanParsing-Dataset) and unzip it. Then run the following program and wait for the run to complete. 
```
cd s01_personsam_training

CUDA_VISIBLE_DEVICES=0 python transform_art_dataset.py --data_path /path/to/atr_dataset
```
Finally divide the dataset into training, testing and validation sets.


2. Then, name the three datasets obtained in the previous step as `atr_item_descriptions_train.json`, `atr_item_descriptions_test.json`, and `atr_item_descriptions_dev.json` and organize them as follows.

```
|-- path/to/atr_dataset/
|   |-- <JPEGImages>/
|       |-- 997_1.jpg
|       |-- 997_2.jpg
|       |-- 997_3.jpg
|       |-- ...
|   |-- <SegmentationClassAug>/
|       |-- 997_1.png
|       |-- 997_2.png
|       |-- 997_3.png
|       |-- ...
|   |-- atr_item_descriptions_train.json
|   |-- atr_item_descriptions_test.json
|   |-- atr_item_descriptions_dev.json
|   |-- atr_label.txt
```

Then download the [BERT model](https://huggingface.co/bert-base-uncased/tree/main) and the [SAM model](https://huggingface.co/facebook/sam-vit-base/tree/main) from huggingface respectively.

Then please run the following code and wait for training.

```
CUDA_VISIBLE_DEVICES=0 python tuning_sam_on_atr_description.py --data_path /path/to/atr_dataset --sam_path sam_path --language_model_path bert_path --batch_size 6 --num_epochs 20
```

# Step 02
First prepare each of the three datasets according to the following steps.
## 2.1 Prepare TBPR Datasets
### 2.1.1 CUHK-PEDES
Download the dataset from [here](https://github.com/ShuangLI59/Person-Search-with-Natural-Language-Description) and organize the dataset as follows:
```
|-- dataset_path/
|   |-- <CUHK-PEDES>/
|       |-- imgs
            |-- cam_a
            |-- cam_b
            |-- ...
|       |-- reid_raw.json
```

### 2.1.2 ICFG-PEDES
Download the dataset from [here](https://github.com/zifyloo/SSAN) and organize the dataset as follows:
```
|-- dataset_path/
|   |-- <ICFG-PEDES>/
|       |-- imgs
            |-- test
            |-- train 
|       |-- ICFG_PEDES.json
```

### 2.1.3 RSTPReid
Download the dataset from [here](https://github.com/NjtechCVLab/RSTPReid-Dataset) and organize the dataset as follows:
```
|-- dataset_path/
|   |-- <RSTPReid>/
|       |-- imgs
|       |-- data_captions.json
```

## You need to extractor the phrase from these dataset

Run the following code

```
cd s02_generate_relationshaips

CUDA_VISIBLE_DEVICES=0 python generate_mask.py --data_path /path/to/dataset_path --dataset "CUHK-PEDES" --bert_path /path/to/bert_path --sam_path /path/to/sam_path --trained_sam /path/to/trained_sam_in_s01 --step 1000000
```

You need to run this step on all three datasets, achieved by changing the `--dataset` flag.

Finally, the organizational dataset is shown below.

```
|-- dataset_path/
|   |-- <CUHK-PEDES>/
|       |-- imgs
            |-- cam_a
            |-- cam_b
            |-- ...
|       |-- segs
            |-- 0__CUHK01_0363004_0.png
            |-- 0__CUHK01_0363004_1.png
            |-- ...
|       |-- reid_raw.json
|       |-- CUHK-PEDES_data_final_68126.json
|       |-- CUHK-PEDES_score_dict_68125.json
|   |-- <ICFG-PEDES>/
|       |-- imgs
            |-- test
            |-- train 
|       |-- segs
            |-- 0__test_0627_0627_010_05_0303afternoon_1591_0_0.png
            |-- 0__test_0627_0627_010_05_0303afternoon_1591_0_1.png
            |-- ...
|       |-- ICFG_PEDES.json
|       |-- ICFG-PEDES_data_final_34674.json
|       |-- ICFG-PEDES_score_dict_34673.json
|   |-- <RSTPReid>/
|       |-- imgs
|       |-- segs
            |-- 0_0000_c14_0031_0.png
            |-- 0_0000_c14_0031_1.png
            |-- ...
|       |-- data_captions.json
|       |-- RSTPReid_data_final_37010.json
|       |-- RSTPReid_score_dict_37009.json

```

# Step 03
Finally, run the following code to train the model

```
cd s03_semantic_alignment

CUDA_VISIBLE_DEVICES=0 python train.py --name sapsam --img_aug --batch_size 64 --MLM --loss_names 'sdm+mlm_part+matching' --dataset_name 'CUHK-PEDES' --root_dir ./  --num_epoch 60 --part_seg --part_mask_prob 0.35
```