Keywords: Person Re-Identification; Joint Training; Language Guidance; Dual Text Supervision; Cross-Domain Generalization
Abstract: Person Re-Identification (ReID) is a key problem in intelligent surveillance but often suffers from dataset bias and poor cross-domain generalization. This paper presents DTTA, a network that incorporates natural language descriptions into the backbone to enhance the interpretability and discriminability of visual features. A dual-level supervision mechanism (CESL) aligns global semantics and constrains local details, while joint training on Market-1501, CUHK03, and MSMT17 mitigates dataset bias. Experiments demonstrate that DTTA achieves superior Rank-1 and mAP performance, particularly under cross-domain settings, offering new insights into multimodal and multi-dataset ReID.
Submission Number: 9
Loading