SKELETRACK: Efficient Tracking of Skeleton in Blurry Videos for Human Activity Recognition

Haoran Qi, Zihan Zhang, Farhana H. Zulkernine

Published: 01 Jan 2024, Last Modified: 11 Feb 2025CIoT 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Human pose estimation is the task of predicting body joint locations and orientations of a person in an image or video. This problem has many practical applications in areas such as sports analysis, surveillance, and human activity recognition. However, current state-of-the-art approaches struggle with detecting fast-moving objects as they appear blurry in videos or images, which happens frequently in real-world scenarios. In this paper, we present the architecture of a top-down multi-person pose estimation framework to address the above problem, specifically, human recognition from blurry videos. Our pipeline can track the spatio-temporal information in the input videos to detect and track human skeletons where object tracking algorithms fail and create an estimated bounding box based on the moving target object’s velocity and direction. We validate our algorithm using the FineGym dataset containing fast moving athletes performing gymnastics for which the current state-of-the-art accuracy in human activity recognition is ${2 5. 2 \%}$. Skeletrack achieves 66.55% Top-1 and 89.36% Top-5 mean accuracy in skeleton based activity recognition on FineGym.