Abstract: Current action recognition algorithms in ice hockey do not fully exploit the temporal cues available in video. To solve this challenge, we introduce a two-stream network utilizing player pose sequences and optical flow features for recognizing hockey actions. Player pose sequences are compact representations consisting of frame by frame human and stick joint locations and angles between joints. The optical flow features are obtained by a state-of-the-art optical flow algorithm. The player pose sequences are processed by a two-layered Long short-term memory (LSTM) network. The LSTM output is fused with optical flow features processed by a convolutional neural network (CNN). Experimental results demonstrate the efficacy of the method by achieving 90.48% test accuracy on the HARPET (Hockey Action Recognition Pose Estimation, Temporal) dataset thus surpassing current benchmark by 5%. The network performs better than the current benchmark in segregating similar classes like passing and shooting. It achieves a 90% reduction in parameters and 80% reduction in floating point operations per second (FLOPs) than the benchmark on the HARPET dataset, thus furthering the effectiveness of the network.
Loading