Text-Based Person Search in Full Images via Semantic Context Disentangling and Prototype Learning

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Cross-modal Retrieval;Text-based Person Search;Context Disentangling;Prototype Learning
TL;DR: This study introduces a framework combining Context Disentangling and Prototype Inheriting to enhance TBPS in full images, improving robustness and performance in complex scenes.
Abstract: Text-based Person Search (TBPS) in full images aims to locate a target pedestrian within uncropped images based on natural language descriptions. Existing TBPS methods typically rely on candidate region generation and cross-modal matching. However, in complex scenes,especially those with multiple pedestrians in the image.It is often challenging to distinguish the target pedestrian from the background or other individuals. This leads to limited generalization capabilities. To address these issues, we propose a new TBPS framework named ProtoDis-TBPS, which integrates three key components: Semantic Context Decoupling (SCD), Prototype Embedding Learning (PEL), and a Cross-modal Person Re-identification (ReID) module. Specifically, SCD enhances cross-modal feature discrimination by separating background and irrelevant contextual information. PEL improves the model's robustness in complex scenes by learning prototype features for pedestrian categories. Finally, the ReID module, based on a Transformer architecture, further boosts the accuracy of both text-based pedestrian detection and re-identification in full images.Experiments demonstrate that our proposed method presents a significant challenge to existing approaches in this field.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10668
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview