A Robust Person Shape Representation via Grassmann Channel Pooling

Tetsu Matsukawa, Einoshin Suzuki

Published: 30 Nov 2024, Last Modified: 15 Nov 2024OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Robustly estimating a person's orientation in various clothing and image styles is essential for implementing vision systems in real-world applications. In this task, the spatial arrangement of local parts can be a key factor for a precise estimation. Therefore, we focus on channel pooling, which summarizes less relevant channel activations of a feature map produced by ConvNets. However, the limited discriminative ability of the representation produced by naive channel pooling methods leads to imprecise estimations. To address this problem, we propose Grassmann Channel Pooling (GCP), which summarizes each feature map as a linear subspace of its spatial bases. Specifically, GCP extracts the spatial bases from a feature map, where each basis represents globally similar positions across channels. A linear subspace spanned by these vectors is invariant to permutations of feature channels and scalings of the feature map and is thus expected to be robust. Meanwhile, GCP extracts discriminative co-occurrence information from various spatial positions using the projection metric of Grassmann manifold. Experimental results on the PersonX and TUD datasets indicate that GCP has superior discriminative power compared to existing pooling methods, as well as its robustness.