Abstract: Understanding user actions from egocentric videos is crucial in developing intelligent mixed reality (MR) systems. One key aspect is the recognition of hand actions and gestures, which enables user interaction and adaptation of the system to real-world user actions. In this paper, we present a comprehensive pipeline for egocentric hand action recognition for mixed reality applications. Our approach incorporates an MR-guided data collection method that eliminates the need for explicit manual annotation and guidance. We also propose a robust and efficient skeleton-based hand action recognition model specifically designed for real-time MR use cases. To validate our proposed framework and demonstrate its effectiveness, we conducted a case study involving industrial precision inspection tasks. Utilizing our MR-guided data collection system, we efficiently collected hand inspection action data and built a comprehensive dataset. We then trained our proposed model on this dataset, employing a feature refinement strategy. We conducted extensive evaluations, including standard offline analysis and real-time inference in an MR system, to thoroughly test the model. Our experimental results showcase the efficacy of our proposed pipeline and its potential for practical use in various scenarios. The supplementary materials including source codes, datasets, and demo videos are publicly available on www.sail-nu.com/ismar-har.
Loading