# Human Pose Estimation with Token-Centric Adaptive Attention

## Environment
The code is developed using python 3.8 on Windows 11. The code is developed and tested using NVIDIA RTX 3090 GPU cards. Other platforms are not fully tested.

## Usage
### Installation
1. Clone this repo.
2. Setup conda environment:
   ```
   conda create -n PCT python=3.8 -y
   conda activate PCT
   pip install -r requirements.txt
   ```

### Data Preparation

To obtain the COCO dataset, it can be downloaded from the [COCO download](http://cocodataset.org/#download), and specifically the 2017 train/val files are required. Additionally, the person detection results can be acquired from the [HRNet](https://github.com/leoxiaobin/deep-high-resolution-net.pytorch) repository. The resulting data directory should look like this:

    ${POSE_ROOT}
    |-- data
    `-- |-- coco
        `-- |-- annotations
            |   |-- person_keypoints_train2017.json
            |   `-- person_keypoints_val2017.json
            |-- person_detection_results
            |   |-- COCO_val2017_detections_AP_H_56_person.json
            |   |-- COCO_test-dev2017_detections_AP_H_609_person.json
            `-- images
                |-- train2017
                |   |-- 000000000009.jpg
                |   |-- 000000000025.jpg
                |   |-- 000000000030.jpg
                |   |-- ... 
                `-- val2017
                    |-- 000000000139.jpg
                    |-- 000000000285.jpg
                    |-- 000000000632.jpg
                    |-- ... 

### Model Preparation
To use this codebase, we provide the following models and tools:
1. SimMIM Pretrained Backbone: We provide SimMIM pre-trained swin [models](https://pan.baidu.com/s/1QR2-DWW6GRk5yr0mNMPR7w?pwd=juyp) that you can download. Alternatively, you can use [SimMIM](https://github.com/microsoft/SimMIM) repository to pretrain your own models. (Note: When loading the SimMIM model, it is normal to encounter missing keys in the source state_dict, including relative_coords_table, relative_position_index, and norm3. These missing keys do not affect the results.)
2. Heatmap Trained Backbone: We offer swin [models](https://pan.baidu.com/s/1QR2-DWW6GRk5yr0mNMPR7w?pwd=juyp) that are trained on the COCO dataset with heatmap supervision. 
3. [Optional] Well-Trained Tokenizers: You can download well-trained PCT tokenizers in the (https://pan.baidu.com/s/1QR2-DWW6GRk5yr0mNMPR7w?pwd=juyp).
4. [Optional] Well-Trained Pose Models: Our well-trained PCT pose models can be found in the (https://pan.baidu.com/s/1QR2-DWW6GRk5yr0mNMPR7w?pwd=juyp).

After completing the above steps, your models directory should look like this:

    ${POSE_ROOT}
    |-- weights
    `-- |-- simmim
        |   `-- swin_base.pth
        |-- heatmap
        |   `-- swin_base.pth
        |-- tokenizer [Optional]
        |   `-- CM_PCT_tokenzier.pth
        `-- pct [Optional]
            `-- CM_PCT.pth 

### PCT

#### Stage I: Training Tokenizer

```
python tools/train.py configs/pct_base_tokenizer.py --gpu-ids 0
```
Aftering training tokenizer, you should move the well-trained tokenizer from the `work_dirs/pct_base_tokenizer/epoch_30.pth` to the `weights/tokenizer/CM_PCT_tokenzier.pth` and then proceed to the next stage. Alternatively, you can change the config of classifier using `--cfg-options model.keypoint_head.tokenizer.ckpt=work_dirs/pct_base_tokenizer/epoch_30.pth` to train the classifier.

#### Stage II: Training Classifier

```
python ./tools/train.py ./configs/pct_base_classifier.py --gpu-ids 0
```

Finally, you can test your model using the script below.
```

python tools/test.py configs/pct_base_classifier.py work_dirs/pct_base_classifier/epoch_270.pth --eval mAP --cfg-options data.test.data_cfg.use_gt_bbox=False
```



