Abstract: Human action recognition is a hot research topic in computer vision, which has extensive applications including human-computer interaction, robot and surveillance. Some outstanding human action datasets have been published. However, they still have some defects, such as the limited modality types, camera views, complexity of scene and action categories. Furthermore, most of them are devised for a subset of the action recognition, such as single-modal recognition and single-view recognition. In this paper, we propose a multi-modal cross-view human action dataset (named CAS-YNU MCHAD). It consists of five synchronous data modalities, including RGB image, depth maps, human segment maps, skeleton data and inertial data. Our dataset contains 14782 action samples from 50 subjects and 10 actions in two perspectives. Extensive benchmark experiments on this dataset are conducted with the state-of-the-art recognition approaches. Experimental results illustrate that it is challenging on cross-view action recognition. We also provide the baseline for the assessment of existing recognition methods.
0 Replies
Loading