Multimodal Multi-Stream Deep Learning for Egocentric Activity Recognition

Sibo Song, Vijay Chandrasekhar, Bappaditya Mandal, Liyuan Li, Joo-Hwee Lim, Giduthuri Sateesh Babu, Phyo Phyo San, Ngai-Man Cheung

2016 (modified: 10 Nov 2022)CVPR Workshops 2016Readers: Everyone

Abstract: In this paper, we propose a multimodal multi-stream deep learning framework to tackle the egocentric activity recognition problem, using both the video and sensor data. First, we experiment and extend a multi-stream Convolutional Neural Network to learn the spatial and temporal features from egocentric videos. Second, we propose a multistream Long Short-Term Memory architecture to learn the features from multiple sensor streams (accelerometer, gyroscope, etc.). Third, we propose to use a two-level fusion technique and experiment different pooling techniques to compute the prediction results. Experimental results using a multimodal egocentric dataset show that our proposed method can achieve very encouraging performance, despite the constraint that the scale of the existing egocentric datasets is still quite limited.

0 Replies