Looking Clearer With Text: A Hierarchical Context Blending Network for Occluded Person Re-Identification

Published: 01 Jan 2025, Last Modified: 22 Jul 2025IEEE Trans. Inf. Forensics Secur. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Existing occluded person re-identification (re-ID) methods mainly learn limited visual information for occluded pedestrians from images. However, textual information, which can describe various human appearance attributes, is rarely fully utilized in the task. To address this issue, we propose a Text-guided Hierarchical Context Blending Network (THCB-Net) for occluded person re-ID. Specifically, at the data level, informative multi-modal inputs are first generated to make full use of the auxiliary role of textual information and make image data have a strong inductive bias for occluded environments. At the feature expression level, we design a novel Hierarchical Context Blending (HCB) module that can adaptively integrate shallow appearance features obtained by CNNs and multi-scale semantic features from visual transformer encoder. At the model optimization level, a Multi-modal Feature Interaction (MFI) module is proposed to learn the multi-modal information of pedestrians from texts and images, then guide the visual transformer encoder and HCB module to further learn discriminative identity information for occluded pedestrians through Image-Multimodal Contrastive (IMC) learning. Extensive experiments on standard occluded person re-ID benchmarks demonstrate that the proposed THCB-Net outperforms state-of-the-art methods. The code will be available soon.
Loading