MGR-Dark: A Large Multimodal Video Dataset and RGB-IR benchmark for Gesture Recognition in Darkness

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Gesture recognition plays a crucial role in natural human-computer interaction and sign language recognition. Despite considerable progress in normal daylight, research dedicated to gesture recognition in dark environments is scarce. This is partly due to the lack of sufficient datasets for such a task. We bridge the gap of the lack of data for this task by collecting a new dataset: a large-scale multimodal video dataset for gesture recognition in darkness (MGR-Dark). MGR-Dark is distinguished from existing gesture datasets by its gesture collection in darkness, multimodal videos(RGB, Depth, and Infrared), and high video quality. To the best of our knowledge, this is the first multimodal dataset dedicated to human gesture action in dark videos of high quality. Building upon this, we propose a Modality Translation and Cross-modal Distillation (MTCD) RGB-IR benchmark framework. Specifically, the modality translator is firstly utilized to transfer RGB data to pseudo-Infrared data, a progressive cross-modal feature distillation module is then designed to exploit the underlying relations between RGB, pseudo-Infrared and Infrared modalities to guide RGB feature learning. The experiments demonstrate that the dataset and benchmark proposed in this paper are expected to advance research in gesture recognition in dark videos. The dataset and code will be available upon acceptance.
Primary Subject Area: [Engagement] Summarization, Analytics, and Storytelling
Secondary Subject Area: [Experience] Interactions and Quality of Experience
Relevance To Conference: Gesture recognition has potential applications in various fields such as surveillance, human-computer interaction, autonomous driving, and virtual/augmented reality. Although some progress has been made under normal lighting, real-life scenes often require recognition under darker lighting conditions. However, few studies focus on gesture recognition in dark environments. This is partly due to the very low proportion of dark videos in current benchmark datasets, and a lack of datasets dedicated to gesture analysis in the dark. Our new work introduces a new multimodal video dataset for gesture recognition in dark to bridge the gap in the lack of dark video data.
Supplementary Material: zip
Submission Number: 3162
Loading