Abstract: Traditional text-based person ReID assumes that person descriptions from witnesses are complete and provided at once. However, in real-world scenarios, such descriptions are often partial or vague. To address this limitation, we introduce a new task called interactive person re-identification (Inter-ReID). Inter-ReID is a dialogue-based retrieval task that iteratively refines initial descriptions through ongoing interactions with the witnesses. To facilitate the study of this new task, we construct a dialogue dataset that incorporates multiple types of questions by decomposing fine-grained attributes of individuals. We further propose LLaVA-ReID, a question model that generates targeted questions based on visual and textual contexts to elicit additional details about the target person. Leveraging a looking-forward strategy, we prioritize the most informative questions as supervision during training. Experimental results on both Inter-ReID and text-based ReID benchmarks demonstrate that LLaVA-ReID significantly outperforms baselines.
Lay Summary: Imagine you’re trying to help security staff find someone you saw earlier on a busy street or in a shopping mall. You might say, “He was tall, wearing a plaid shirt, and carrying a bag.” But such descriptions are often vague and incomplete, making it hard for computer systems to identify the person in surveillance footage.
Our research proposes a more interactive solution. Instead of relying on a one-time description, our system asks follow-up questions to refine your memory, much like a helpful assistant. It might ask, “What color were his pants?” or “Was he carrying anything else?”
This approach helps systems find the right person more accurately and efficiently in real-world settings like malls, transit hubs, or office buildings.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/XLearning-SCU/LLaVA-ReID
Primary Area: Applications->Computer Vision
Keywords: Person Re-Identification, Interactive Retrieval
Submission Number: 249
Loading