University of Amsterdam and Renmin University at TRECVID 2016: Searching Video, Detecting Events and Describing Video

Cees G. M. Snoek, Jianfeng Dong, Xirong Li, Xiaoxu Wang, Qijie Wei, Weiyu Lan, Efstratios Gavves, Noureldien Hussein, Dennis C. Koelma, Arnold W. M. Smeulders

14 Jan 2020 (modified: 14 Jan 2020)OpenReview Archive Direct UploadReaders: Everyone

Abstract: In this paper, we summarize our TRECVID 2016 video recognition experiments. We participated in three tasks: video search, event detection, and video description. Here we describe the tasks on event detection and video description. For event detection, we explore semantic representations based on VideoStory and an ImageNet Shuffle for both zero-shot and few-example regimes. For the showcase task on the video description, we experiment with a deep network that predicts a visual representation from a natural language description and use this space for the sentence matching. For generative description, we enhance a neural image captioning model with Early Embedding and Late Reranking. The 2016 edition of the TRECVID benchmark has been a fruitful participation for our joint-team, resulting in the best overall result for zero- and few-example event detection as well as video description by matching and in generative mode.

0 Replies