Abstract: There are two kinds of correlations in multi-view human actions. One is the temporal correlation between adjacent frames, the other is the spatial correlation between different cameras. In this paper, we introduce a discriminative model, two-dimensional Conditional Random Field (2D CRF) which can model the spatial-temporal correlations in actions, and present algorithms for multi-view single person action and two persons interaction recognition based on this model. For the action representation part, we model each action as a bag of visual words based on the spatial-temporal features; for the action recognition part, we use 2D CRF for multi-view single person action and two persons interaction recognition. We use IXMAS and our Multi-view Human Interaction (MHI) datasets for the experiments. The results show the superior performance of the proposed approach over most of the state-of-the-art methods.
0 Replies
Loading