Abstract: Searching for specific persons from surveillance videos captured by different cameras, is a key yet under-addressed challenge in multimedia system. Related person retrieval works mainly focus on searching person by visual appearance, known as person re-identification. However, the initial visual image may not be available in some practical applications. For example, the criminal is described by a text description indirectly, "A young woman wearing a red casual with a backpack", the traditional methods can not conquer this issue. Based on a set of pre-defined attributes that the text description query can be transformed to an attribute vector, thus can be used to retrieval in the gallery set. And yet, the user-provided attributes are sometimes incomplete. This new issue is defined as Specific Person Retrieval via Incomplete Text Description. In this paper, we conduct a specific attribute completion to enrich the original text query and generate a more expressive attribute vector. Then, a pairwise-based metric learning is introduced for completed attribute vectors. Extensive experiments conducted on two benchmark datasets have shown our superior performance.
0 Replies
Loading