Abstract: Pedestrian Attribute Recognition (PAR) is an indispensable topic in smart video analysis. The recognition of fine-grained attributes is challenging work as they are indistinguishable in surveillance images. In this study, we propose an Attention Auxiliary Spatial Fusion (AASF) model to improve the performance of PAR from the following two aspects: (1) We employ an Embedded Attention (EA) module to encode position information into channel information so that it can aggregate features in two different spatial directions with the small-scale visual clues. (2) We propose a Feature Pyramid Adaptive Fusion (FPAF) module to adaptively select useful features for multiple attributes from different levels with contradictory information. Extensive experiments conducted on two large public indoor and outdoor PAR datasets demonstrate that our model achieves state-of-the-art results, especially obtaining better performances on fine-grained attributes.
0 Replies
Loading