Abstract: In this paper we propose a novel spatial-temporal descriptor for action recognition. We extend a recent image local descriptor, DAISY, to three dimensions to deal with the information in the additional temporal domain in videos. The new 3D DAISY descriptor is both functionally discriminative and computationally efficient. We use the bag-of-words framework and non-linear SVM for classification. The experiments on public action datasets, KTH, WEIZMANN, YouTube, and UT-Interaction, demonstrate the promising results of our method.
Loading