Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models

Mohammad Belal; Taimur Hassan; Abdelfatah Hassan Ahmed; Ahmad Aljarah; Nael Alsheikh; Irfan Hussain

Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models

Mohammad Belal, Taimur Hassan, Abdelfatah Hassan Ahmed, Ahmad Aljarah, Nael Alsheikh, Irfan Hussain

Published: 01 Jan 2024, Last Modified: 13 Nov 2024AVSS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Human activity recognition is a crucial area of research that involves understanding human movements using computer and machine vision technology. Deep learning has emerged as a powerful tool for this task, with models such as Convolutional Neural Networks (CNNs) and Transformers being employed to capture various aspects of human motion. One of the key contributions of this work is the demonstration of the effectiveness of feature fusion in improving human activity recognition accuracy, which has important implications for the development of more accurate and robust activity recognition systems. This approach addresses a limitation in the field, where the performance of existing models is often limited by their inability to capture both spatial and temporal features effectively. This work presents an approach for human activity recognition using sensory data extracted from four distinct datasets: HuGaDB, PKU-MMD, LARa, and TUG. Two models, the Parameter-Optimized Multi-Stage Graph Convolutional Network (PO-MS-GCN) and a Transformer, were trained and evaluated on each dataset to calculate accuracy and F1-score. Subsequently, the features from the last layer of each model were combined and fed into a classifier. The findings prove that PO MS-GCN outperforms state-of-the-art models in human activity recognition. Specifically, HuGaDB achieved an accuracy of 92.7% and f1-score of 95.2%, TUG achieved an accuracy of 93.2% and f1-score of 98.3%, while LARa and PKU-MMD achieved lower accuracies of 64.31% and 69%, respectively, with corresponding f1-scores of 40.63% and 48.16%. Moreover, feature fusion exceeded the PO-MS-GCN’s results in PKU-MMD, LARa, and TUG datasets.

Loading