Abstract: Human action detection in videos is a challenging problem in the field of Computer Vision and it has become an active researching field in recent years. For most published methods, which analyses entire video and assign a single action label; by contrast, in our research, it has been proved that most of actions could be detected within only a few frames. Based on this hypothesis, a temporal structure based model named Latent Key Frames Model (LKFM) is proposed, in which the action was represented as a sequence of Key Frames. LKFM is able to find the optimal Key Frames sequences with the help of latent support vector machine (Latent SVM); and for each Key Frame in the Key Frames sequence, a 2d model is built with the help of Deformable Part-based Model (DPM). The proposed method has been evaluated on Weizmann dataset and UCF sports dataset, and the experimental results demonstrate that this model is able to achieve competitive performance.
Loading