DMMLN: A deep multi-task and metric learning based network for video classification

Published: 01 Jan 2017, Last Modified: 13 May 2025SSCI 2017EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper we propose a novel deep multi-task and metric learning based network for video classification. Specifically, given the video sequence, features are extracted from each frame using a CNN-based temporal segment network which can model spatial information as well as long-term temporal dynamics in the video. The features from the first step are then aggregated using temporal pooling method to give a global representation for the complete video to deal with the problem that local frames may not contain the information indicated by the global video labels. Then we propose a self-adaptive margin based deep metric learning method to learn intra-class variations and inter-class similarities paying more attention to closer negative samples. Using a multi-task learning method, the temporal segment network, temporal pooling layer, deep metric learning and classification are jointly trained to get a global optimization network. Experiments on UCF101 and HMDB51 datasets show that this approach achieves the state-of-the-art performance.
Loading