Keywords: Deep learning, Action segmentation, Action recognition, Benchmark dataset, Fine-grained actions, Stroke rehabilitation, Seq2seq models, sequence prediction
TL;DR: We introduce a new benchmark dataset for the identification of subtle and short-duration actions. We also propose a novel seq2seq approach, which outperforms the existing methods on the new as well as standard benchmark datasets.
Abstract: Automatic action identification from video and kinematic data is an important machine learning problem with applications ranging from robotics to smart health. Most existing works focus on identifying coarse actions such as running, climbing, or cutting vegetables, which have relatively long durations and a complex series of motions. This is an important limitation for applications that require identification of more elemental motions at high temporal resolution. For example, in the rehabilitation of arm impairment after stroke, quantifying the training dose (number of repetitions) requires differentiating motions with sub-second durations. Our goal is to bridge this gap. To this end, we introduce a large-scale, multimodal dataset, StrokeRehab, as a new action-recognition benchmark that includes elemental short-duration actions labeled at a high temporal resolution. StrokeRehab consists of a high-quality inertial measurement unit sensor and video data of 51 stroke-impaired patients and 20 healthy subjects performing activities of daily living like feeding, brushing teeth, etc. Because it contains data from both healthy and impaired individuals, StrokeRehab can be used to study the influence of distribution shift in action-recognition tasks. When evaluated on StrokeRehab, current state-of-the-art models for action segmentation produce noisy predictions, which reduces their accuracy in identifying the corresponding sequence of actions. To address this, we propose a novel approach for high-resolution action identification, inspired by speech-recognition techniques, which is based on a sequence-to-sequence model that directly predicts the sequence of actions. This approach outperforms current state-of-the-art methods on StrokeRehab, as well as on the standard benchmark datasets 50Salads, Breakfast, and Jigsaws.
Supplementary Material: pdf
Open Credentialized Access: The dataset can be access via https://simtk.org/. One can create a free account on simtk and access the dataset.
Dataset Url: https://simtk.org/projects/primseq https://github.com/aakashrkaku/seq2seq_hrar Sample data: https://drive.google.com/drive/folders/1_a48XeRjFRdwaiaQXAV3tvAYb-4VmDbM?usp=sharing
Dataset Embargo: We have released all three parts of the data: sensor data for stroke-impaired patients, extracted video features for stroke-impaired patients, and sensor data for healthy subjects.We have provided sample data from reviewers' reference.
License: License for the dataset can be found in Appendix I (supplementary material).
Author Statement: Yes
Contribution Process Agreement: Yes
In Person Attendance: Yes