Weakly Supervised Action Recognition and Localization Using Web Images

Cuiwei Liu, Xinxiao Wu, Yunde Jia

2014 (modified: 18 Nov 2022)ACCV (5) 2014Readers: Everyone

Abstract: This paper addresses the problem of joint recognition and localization of actions in videos. We develop a novel Transfer Latent Support Vector Machine (TLSVM) by using Web images and weakly annotated training videos. In order to alleviate the laborious and time-consuming manual annotations of action locations, the model takes training videos which are only annotated with action labels as input. Due to the non-available ground-truth of action locations in videos, the locations are treated as latent variables in our method and are inferred during both training and testing phrases. For the purpose of improving the localization accuracy with some prior information of action locations, we collect a number of Web images which are annotated with both action labels and action locations to learn a discriminative model by enforcing the local similarities between videos and Web images. A structural transformation based on randomized clustering forest is used to map Web images to videos for handling the heterogeneous features of Web images and videos. Experiments on two publicly available action datasets demonstrate that the proposed model is effective for both action localization and action recognition.

0 Replies