Abstract: Person re-identification (ReID) aims to match a target pedestrian across non-overlapping camera views, a task that faces significant challenges such as data dependency and overfitting due to the lim- ited availability of annotated datasets. Recently, large pre-training models, such as CLIP, have emerged as a transformative paradigm in visual learning. These models generate universal features with strong discriminative power and generalization capabilities, making them highly suitable for diverse ReID tasks. In this paper, we present a comprehensive survey of large pre-training models for ReID, which we refer to as the ''One for All'' approach. This paradigm leverages large-scale pre-training models, such as self-supervised or language- image pre-training models, as general feature extractors adaptable to multiple ReID tasks. We provide an in-depth analysis of the strengths of CLIP and DINO series, summarizing its application and advance- ments across six extensively studied ReID directions and we review self-supervised pre-training approaches for ReID and related tasks, highlighting their ability to achieve task-agnostic adaptability. Our findings demonstrate the immense potential of large pre-training models in advancing ReID research, offering valuable insights for future developments in this field. Relevant datasets and documents are at https:// github.com/ Vill-Lab/ Awesome-Evolving-ReID.
Loading