Fast-slow visual network for action recognition in videosDownload PDFOpen Website

2022 (modified: 08 Nov 2022)Multim. Tools Appl. 2022Readers: Everyone
Abstract: The visual speed, as important visual information on videos, has the ability to enhance the performance of video action recognition. Modeling the visual speed of different actions facilitates their recognition. Previous works have often attempted to capture the visual speed through sampling raw videos at multiple rates and then constructing an input-level frame pyramid, which usually requires to manipulate a costly multibranched network. In this work, we proposed a fast–slow visual network (FSVN) to improve the accuracy of video action recognition via a visual speed stripping strategy which can flexibly be integrated into various excellent 2-D or 3-D backbone networks. Specifically, by the method of the frame difference, we divided a video into fast visual frames and slow visual frames which can respectively represent the action information and the spatial information in the video. Then, we designed a fast visual information recognition network to capture the action information and a slow visual information recognition network to record the spatial information; finally, these two networks were integrated. The experiments on the data sets UCF101 (98.3%) and HMDB51 (76.4%) prove the superiority of our method over the traditional approaches.
0 Replies

Loading