A Transformer-based System for Action Spotting in Soccer Videos

Published: 01 Jan 2022, Last Modified: 09 May 2025MMSports@MM 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Action Spotting in the broadcast soccer game is important to understand salient actions and video summary applications. In this paper, we propose an efficient transformer-based system for action spotting in soccer videos. We first use the multi-scale vision transformer to extract features from the videos. Then we adopt a sliding window strategy to further utilize temporal features and enhanced temporal understanding. Finally, the features are input to NetVLAD++ model to obtain the final results. Our model can learn a hierarchy of robust representations and perform well in the Action Spotting Task of SoccerNet Challenge 2022. Our method achieves excellent results and outperforms the baseline and previous published works.
Loading