Abstract: Person retrieval video using natural language description (NLD) is an emerging research area and depends largely on dataset diversity. Unifying datasets increases overall quality; therefore, the paper presents a case study on merging two different style data sets; one has NLD with images (CUHK-PEDES), and the other has discrete annotations with videos (AVSS). The unifying framework brings out the practical challenges and their solution. Explicit discussions on data set merging frameworks are missing in the literature, and our work will facilitate the researchers’ requirements.
Loading