Abstract: One-step Weakly Supervised Person Search (WSPS) addresses person detection and re-identification (ReID) within a unified framework, relying solely on pedestrian bounding box annotations for training, without requiring annotated identity labels. This approach enhances the practicality and efficiency of person search in real-world applications. However, WSPS faces two primary challenges: (1) the significant feature discrepancy between ReID and pedestrian detection tasks complicates shared representation learning, and (2) accurately estimating pseudo identity for each person image is challenging due to unrefined detections and significant intra-class variation in complex scenes. To address these challenges, we introduce a multi-granularity scene-aware graph convolution framework, which jointly optimizes task-specific features, improves pseudo-label estimation, and reduces the effects of label noise. Specifically, the Multi-granularity Feature Alignment (MFA) module in our designed two-branch network leverages bi-directional cluster-level interactions across multiple granularities to address the feature discrepancy. Building on MFA, we develop the Graph-convolution-based feature enhancement for more reliable Scene-aware pseudo-label Estimation (GSE). Meanwhile, the Label Refinement module, with its global-local Collaborative Learning (LCL) mechanism, addresses label noise by refining labels at both global and local levels, ensuring more robust weakly supervised learning. Extensive experimental evaluations demonstrate the effectiveness of the proposed method, achieving significant performance improvements over state-of-the-art approaches on the CUHK-SYSU and PRW datasets. Code is available at https://github.com/haichuntai/MSGM-main.
External IDs:dblp:journals/ijcv/ChengTWZLG26
Loading