everyone
since 20 Jul 2024">EveryoneRevisionsBibTeXCC BY 4.0
Gesture recognition plays a crucial role in natural human-computer interaction and sign language recognition. Despite considerable progress in normal daylight, research dedicated to gesture recognition in dark environments is scarce. This is partly due to the lack of sufficient datasets for such a task. We bridge the gap of the lack of data for this task by collecting a new dataset: a large-scale multimodal video dataset for gesture recognition in darkness (MGR-Dark). MGR-Dark is distinguished from existing gesture datasets by its gesture collection in darkness, multimodal videos(RGB, Depth, and Infrared), and high video quality. To the best of our knowledge, this is the first multimodal dataset dedicated to human gesture action in dark videos of high quality. Building upon this, we propose a Modality Translation and Cross-modal Distillation (MTCD) RGB-IR benchmark framework. Specifically, the modality translator is firstly utilized to transfer RGB data to pseudo-Infrared data, a progressive cross-modal feature distillation module is then designed to exploit the underlying relations between RGB, pseudo-Infrared and Infrared modalities to guide RGB feature learning. The experiments demonstrate that the dataset and benchmark proposed in this paper are expected to advance research in gesture recognition in dark videos. The dataset and code will be available upon acceptance.