Abstract: Animal pose estimation has received increasing attention in recent years. The main challenge for this task is the diversity of animal species compared to their human counterpart. To address this issue, we design a keypoint-interactive Transformer model for high-resolution animal pose estimation, namely KITPose. Since a high-resolution network maintains local perception and the self-attention module in Transformer is an expert in connecting long-range dependencies, we equip the high-resolution network with a Transformer to enhance the model capacity, achieving keypoints interaction in the decision stage. Besides, to smoothly fit the pose estimation task, we simultaneously train the model parameters and joint weights, which can automatically adjust the loss weight for each specific keypoint. The experimental results obtained on the AP10K and ATRW datasets demonstrate the merits of KITPose, as well as its superior performance over the state-of-the-art approaches.
0 Replies
Loading