HRNeXt: High-Resolution Context Network for Crowd Pose EstimationDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 10 Nov 2023IEEE Trans. Multim. 2023Readers: Everyone
Abstract: Occlusion handling in crowded scenes is an intractable challenge for human pose estimation. To address this problem, we propose two novel feed-forward network structures named Global Feed-Forward Network (GFFN) and Dynamic Feed-Forward Network (DFFN), which are specifically designed for image-based tasks to capture both local and global contextual information within intermediate features and update feature representations with high adaptability for occlusions. By exploiting the context modeling ability of the proposed GFFN and DFFN, we present a novel backbone network, namely High-Resolution Context Network (HRNeXt), which learns high-resolution representations with abundant contextual information to better estimate poses of occluded human bodies. Compared to state-of-the-art pose estimation networks, our HRNeXt absorbs advantages of convolution operation and attention mechanism, and it is more efficient in terms of training data sizes, network parameters and computational costs. Experimental results show that our HRNeXt significantly outperforms state-of-the-art backbone networks on challenging pose estimation datasets with high occurrence of crowds and occlusions.
0 Replies

Loading