Abstract: Volleyball video analytics require precisely detecting both the timing and location of key events. We introduce a novel task: Precise Spatiotemporal Event Spotting, which seeks to accurately determine when and where important events occur within a video. To this end, we created the Volley- ball Nations League (VNL) Dataset, including 8 full games, 1,028 rally videos, and 6,137 annotated events with both temporal and spatial localization. Our best model, the Spatiotemporal Event Spotter (STES), outperforms the current state-of-the-art (SOTA) in temporal action spotting by 9.86 mean Temporal Average Precision (mTAP) and achieves a notable 80.21 mAP for spatial localization, accurately pinpointing event locations within a 2-6 pixel range. To the best of our knowledge, this is the first work addressing Precise Spatiotemporal Event Spotting in volleyball, establishing a strong baseline for future research in this domain. The code and data for this paper are available publicly at: https://hoangqnguyen.github.io/stes
Loading