Abstract: Acoustic-based human gesture recognition (HGR) applications have drawn increasing academic attention in order to overcome the shortcomings of conventional interaction methods on tiny devices. Existing techniques following a learning-based routine requires collecting massive application-specific training data. What is worse, the cross-domain problem induces additional retraining overhead to enable the systems recognize unseen gestures in different environments. This obviously decreases their scalability and prevent them from real-world deployment. Although some recent works propose different few-shot learning solutions to deal with the cross-domain problem in HGR, they possess shortcomings of being application-specific, high training overhead, and/or incapability to recognize unseen gestures. In this paper, we propose PreGesNet, a few-shot acoustic gesture recognition framework based on task-adaptive pretrained networks whose novelty lies in three aspects: i) leveraging pretrained feature extractor which captures generic knowledge of our collected and open-source large-scale gesture datasets; ii) designing task-specific parameter adaptation mechanism to efficiently update the feature extractor to adapt the pretrained feature extractor to each target task; iii) discovering suitable distance metric and task generation strategy which fit HGR application. According to the experiments, when the model is trained with 10 digit gestures, its recognition accuracies of 26 kinds of letter gestures and 8 kinds of other hand gestures can be up to 80.5% and 93.4% with only two shots, respectively. In addition, the average recognition latency of PreGesNet is less than 0.4 second.
Loading