Abstract: Recent studies on pedestrian attribute recognition have achieved significant improvements by utilizing complex networks and attention mechanisms. However, most of these studies learn the attention map implicitly through the class activation map. In this paper, we propose an explicit attention modeling approach for pedestrian attribute recognition. We construct a mask branch to learn the attention maps with a lightweight feature pyramid network. The features inside the specific mask are then averaged to obtain the scores for attribute recognition. Additionally, we introduce spatial and semantic distillation to improve the consistency of attention masks and attribute scores. Our experiments demonstrate that the proposed explicit attention modeling can achieve state-of-the-art performance on PA100K, PETA, and PAR datasets with negligible parameters.
0 Replies
Loading