A Multitask Disentanglement Framework Guided by Pedestrian Attributes for Video-Based Clothes-Changing Person Re-Identification in Internet of Things

Hengjie Lu, Guangjin Pan, Yilin Gao, Shugong Xu

Published: 01 Mar 2026, Last Modified: 01 Mar 2026IEEE Internet of Things JournalEveryoneRevisionsCC BY-SA 4.0
Abstract: Person re-identification (ReID), a crucial technology for intelligent surveillance in Internet of Things (IoT) systems, aims to search for the target person among the nonoverlapping surveillance cameras. Video-based clothes-changing person ReID (VCC-ReID) has become essential due to the rich information in videos and its broad applications. Because clothes are attached to the human body, the clothes and pedestrian features are highly coupled when extracting features, making VCC-ReID challenging. To solve this challenge, we propose a Multitask Disentanglement Framework guided by Pedestrian Attributes (MTDF-PAttr), whose core is the cross-domain attribute distillation decoupling mechanism. Pedestrian attribute recognition (PAR) is used as an auxiliary task in MTDF-PAttr to guide feature decoupling, thereby enhancing the main task, VCC-ReID’s performance. Since the existing VCC-ReID dataset lacks PAR annotations, we employ knowledge distillation to train the auxiliary task, where the teacher network is a pretrained video-based PAR network. To make the PAR teacher network have better accuracy, stronger generalization, and be able to identify more attributes, we propose a multidataset fusion framework for pedestrian attribute recognition (MDFF-PAttr), whose core is the multiteacher collaborative self-distillation mechanism. MDFF-PAttr can simultaneously use multiple datasets for training and provide a powerful teacher model for MTDF-PAttr to distill its auxiliary task. Experimental results demonstrate that MTDF-PAttr can achieve state-of-the-art performance in the VCC-ReID task, providing an effective method for intelligent surveillance systems in the IoT. In addition, MDFF-PAttr can effectively enhance the accuracy and generalization of the PAR network.
Loading