Abstract: Group re-identification (GReID) is crucial in intelligent video surveillance for retrieving human groups across cameras. However, existing works mainly focus on group variation challenges, including membership and layout changes, and neglect the occluded groups. To address this, we propose a novel Siamese Transformer for GReID, integrating multiscale feature transform and joint learning. Specifically, the multiscale feature transform includes global feature mapping and local feature encoding. To enhance robustness against occlusions, local feature encoding utilizes a random patch regrouping module (RPRM) and dynamical alignment of local features (DALFs). RPRM rearranges and transforms member patch embeddings, generating local features with diversified coverage for handling occluded groups. DALF dynamically aligns the local features to handle misalignment caused by occlusions. Additionally, we employ joint learning of identification and verification to extract robust and discriminative group representations. Experimental results on three benchmark datasets confirm the effectiveness and superiority of our proposed method.
0 Replies
Loading