A Collaborative Multi-modal Fusion Method Based on Random Variational Information Bottleneck for Gesture Recognition

Yang Gu, Yajie Li, Yiqiang Chen, Jiwei Wang, Jianfei Shen

Published: 2021, Last Modified: 22 Jul 2025MMM (1) 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Gesture is a typical human-machine interaction manner, accurate and robust gesture recognition can assist to achieve more natural interaction and understanding. Multi-modal gesture recognition can improve the recognition performance with the help of complex multi-modal relationship. However, it still faces the challenge of how to effectively balance the correlation and redundancy among different modalities, so as to guarantee the accuracy and robustness of the recognition. Hence, in this paper, a collaborative multi-modal learning method based on Random Variational Information Bottleneck (RVIB) is proposed. With random local information selection strategy, some information is compressed by information bottleneck, and the rest is retained directly, so as to make full use of effective redundant information while eliminating invalid redundant information. Experiments on open dataset show that the proposed method can achieve 95.77% recognition accuracy for 21 dynamic gestures, and can guarantee the recognition accuracy when some modality is missing.