Abstract: Unsupervised Domain Adaptation (UDA) person search aims to transfer a model trained on a labeled source domain to an unlabeled target domain without using target annotations. However, existing UDA methods frequently neglect the issue of scale inconsistency between the source and target domains. This inconsistency arises from variations in camera height, tilt angle, focal length, and scene layout. To address this challenge, we propose a Scale-Aware Consistent Alignment Learning (SCALE) framework. Specifically, we propose a Scale-aware Domain Harmonization (SDH) module, which adaptively harmonizes semantic and structural scales through cross-path interaction and consistency refinement to alleviate cross-domain scale inconsistency. To further reduce pseudo-label noise, we introduce a Bidirectional Cluster Regularization (BCR) strategy, which improves pseudo-label reliability by refining the clustering results through a second regularized clustering step. By collaboratively alleviating the impact of scale misalignment and enhancing pseudo-label reliability, our approach achieves state-of-the-art performance on two benchmark person search datasets, with 82.3% mAP and 84.0% top-1 on the CUHK-SYSU dataset, 41.7% mAP and 82.4% top-1 on the PRW dataset. Our source code is available at https://github.com/whhbdmu/SCALE.
Lay Summary: Our paper studies how to make person search systems work more reliably across different environments without requiring expensive manual labeling. Person search is the task of finding a specific person in large collections of images or videos, which is useful in areas such as video retrieval, public safety research, and smart surveillance systems. However, models trained in one environment often perform poorly in another because people may appear at very different sizes due to camera angles, distances, and scene layouts.
To address this problem, we propose a new framework called SCALE, which helps the model better handle differences in person size and image conditions between datasets. Our method teaches the system to align visual information more consistently across environments and also improves the quality of automatically generated training labels. This allows the model to adapt to new datasets without requiring additional human annotations.
We evaluate our approach on widely used public benchmarks and show that it achieves better search accuracy than previous unsupervised methods. The results demonstrate that improving scale consistency and reducing noisy training signals can significantly enhance cross-domain person search performance. Our work contributes toward more practical and scalable visual search systems that can generalize better to real-world scenarios.
Link To Code: https://github.com/whhbdmu/SCALE
Primary Area: General Machine Learning->Unsupervised and Semi-supervised Learning
Keywords: Person search; Unsupervised Domain Adaptive
Originally Submitted PDF: pdf
Submission Number: 21470
Loading