Learning a compact embedding for fine-grained few-shot static gesture recognition

Published: 01 Jan 2024, Last Modified: 07 Mar 2025Multim. Tools Appl. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Gesture recognition and its applications have been widely studied and received much attention in recent years. Existing works on hand gesture recognition aim to train classification models based on several discrete categories, which suffers from time-consuming data collection and low perceptual granularity. Differently, this work proposes a contrastive framework for fine-grained few-shot gesture recognition. To achieve this, we construct a general and compact gesture embedding space to represent arbitrary intricate hand gestures. The embedding distance between hand gestures is consistent with their similarity, reflecting their subtle variations accurately. To learn such an embedding space, we build up a large-scale hand gesture similarity dataset named SimGesture, relying on 944,482 hand image triplets of gesture comparison annotations. Based on SimGesture, we utilize contrastive learning to train a neural network named SimGesNet being capable of projecting arbitrary hand images into a compact gesture embedding space. Our experimental results demonstrate that the learned embedding can be used with great success for few-shot gesture recognition and achieves SOTA results. We also show that our proposed gesture embedding outperforms existing embeddings in representing fine-grained gestures.
Loading