Abstract: Ball trajectory data is one of the most fundamental and useful pieces of information for evaluating players' performance and analyzing game strategies in ball sports. Although vision-based object tracking techniques have been developed to analyze sports competition videos, it is still challenging to accurately recognize and position a high-speed, tiny ball in sports such as badminton, tennis, table tennis, and volleyball, among others. This is especially true in broadcasting videos, where the low frame rate causes high-velocity objects to appear blurry, sometimes disappear, and create afterimages. In this paper, a MaxVit Sequential model, based on the MaxVit architecture, is proposed to track tiny, high-velocity balls in sports broadcasting videos. To address low image quality issues such as blurriness, afterimages, and short-term occlusions, a sequential model that accepts a number of consecutive images is designed to detect these objects. Our approach is motivated by the practical challenges encountered during the manual annotation of ball positions; when the ball becomes indiscernible due to motion blur or occlusion, we frequently rely on adjacent frames to accurately infer its position. The experimental results demonstrate that the sequential frames model surpasses the single frame model in this task. The proposed model achieves superior performance across several metrics compared to baseline and state-of-the-art models. Specifically, our model attains an F1 score of 91.49, an accuracy of 85.41, an average precision of 91.16, and a recall of 91.82.
Loading