Abstract: Most group activity recognition models focus mainly on spatio-temporal features from the players in sports games. Often they do not pay enough attention to the game object, which heavily affects not only individual action but also a group activity. We propose a new group activity recognition model for sports games that incorporates players’ motion information and game object positional information. The proposed method uses a transformer encoder for temporal feature extraction and a ’simple’ conventional convolutional neural network for extracting spatial features and fusing them with the relative ball position-embedded features. The experimental results show that our model achieved comparable results to state-of-the-art methods on the Volleyball dataset by using only one transformer encoder block and the ball position.
Loading