# HashPose: Memory-Efficient Human Pose Estimation via Progressive Hash Codes
## News
- Welcome to check out our recent work on HashPose estimation!


## Introduction
This is an evaluation pytorch implementation of HashPose. 
Real-time human pose estimation on edge devices demands memory-efficient, high-precision methods. The dominant heatmap approaches, however, scale quadratically with input size, waste computation on background regions, and require slow post-processing. We propose HashPose, a framework replacing heatmaps with progressive hash codes: each keypoint is a binary sequence where successive bits refine localization. This direct bit prediction avoids dense heatmap-style background computations and removes the need for argmax or non-maximum suppression to decode coordinates. Furthermore, HashPose utilizes image classification backbones without upsampling layers to achieve high accuracy while significantly boosting its speed. We validate HashPose's performance envelope across a wide range of model sizes, ranging from an efficient 3.5M-parameter HashPose-XT model with 0.82 milliseconds frame latency and 85.9% AP^{50} on the COCO, to a 196.8M Large model that achieves a state-of-the-art 91.9% AP^{50} with 5.57 milliseconds frame latency using only the COCO training set. Simultaneously, HashPose has 510x lower output memory than heatmap configurations (0.48 MB vs 244.8 MB) for a typical 256X192 input, enabling high-throughput pose analysis that maintains high practical precision for on-device applications. Furthermore, its discrete representation is inherently suited for integer-only quantization, offering a clear path to further hardware acceleration on edge devices. </br>


## Environment
The code is developed using python 3.12 and CUDA 12.4 on Ubuntu 22.04. NVIDIA GPUs are needed. The code is developed and tested using NVIDIA H20 and 3090 cards. Other platforms or GPU cards are not fully tested.


## Quick start
### Installation
1. Install pytorch >= v2.5.0 following [official instruction](https://pytorch.org/).
2. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.
3. Install dependencies:
   ```
   pip install -r requirements.txt
   ```
4. Download pretrained models ([GoogleDrive](https://drive.google.com/file/d/186vSsgwTBvkmU8CmZ-zNOLBdsBpRskjx/view?usp=sharing)) and detection file ([GoogleDrive](https://drive.google.com/file/d/10Q98q07Qsaqw-ad2cf7zqMxNKku020VZ/view?usp=sharing))
5. Your directory tree should look like this:

   ```
   ${POSE_ROOT}
   ├── data
   ├── experiments
   ├── lib
   ├── tools
   ├── HashPose_L.ep
   ├── COCO_detections.json
   ├── README.md
   └── requirements.txt
   ```

   
### Data preparation
**For COCO data**, please download from [COCO download](http://cocodataset.org/#download), 2017 Train/Val is needed for COCO keypoints training and validation. Download and extract them under {POSE_ROOT}/data, and make them look like this:
```
${POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ... 
```

### Testing

#### Testing on COCO val2017 dataset

```
python tools/throughput_precision_measure.py --cfg experiments/coco/hashpose/hashpose_large_256_192.yaml TEST.USE_GT_BBOX False
```


This should give per-frame latency and COCO evaluation results

```
* AP@50 91.9 Latency 5.6 milliseconds
```


