Abstract: Analyzing videos is one of the fundamental problems of computer vision and multimedia content analysis for decades. The task is very challenging as video is an information-intensive media with large variations and complexities. Thanks to the recent development of deep learning techniques, researchers in both computer vision and multimedia communities are now able to boost the performance of video analysis significantly and initiate new research directions to analyze video content. This tutorial will present recent advances under the umbrella of video understanding, which start from a unified deep learning toolkit--Microsoft Cognitive Toolkit (CNTK) that supports popular model types such as convolutional nets and recurrent networks, to fundamental challenges of video representation learning and video classification, recognition, and finally to an emerging area of video and language.
Loading