Scale-aware attention-based multi-resolution representation for multi-person pose estimationDownload PDFOpen Website

2022 (modified: 10 Nov 2022)Multim. Syst. 2022Readers: Everyone
Abstract: The performance of multi-person pose estimation has significantly improved with the development of deep convolutional neural networks. However, two challenging issues are still ignored but are key factors causing deterioration in the keypoint localization. These two issues are scale variation of human body parts and huge information loss caused by consecutive striding in multiple upsampling. In this paper, we present a novel network named ‘Scale-aware attention-based multi-resolution representation network’ (SaMr-Net) which targets to make the proposed method against scale variation and prevent the detail information loss in upsampling, leading more precisely keypoint estimation. The proposed architecture adopts the high-resolution network (HRNet) as the backbone, we first introduce dilated convolution into the backbone to expand the receptive field. Then, attention-based multi-scale feature fusion module is devised to modify the exchange units in the HRNet, allowing the network to learn the weights of each fusion component. Finally, we design a scale-aware keypoint regressor model that gradually integrates features from low to high resolution, enhancing the invariance in different scales of pose parts keypoint estimation. We demonstrate the superiority of the proposed algorithm over two benchmark datasets: (1) the MS COCO keypoint benchmark, and (2) the MPII human pose dataset. The comparison shows that our approach achieves superior results.
0 Replies

Loading