TAL4Tennis: Temporal Action Localization in Tennis Videos Using State Space Models

AHMED JOUINI, Faten Chaieb Chakchouk, LOTH Alex, Mohamed Ali Lajnef

Published: 14 Apr 2025, Last Modified: 12 Nov 2025OpenReview Archive Direct UploadEveryoneCC BY-NC-ND 4.0

Abstract: Temporal action localization is a classic computer vision problem in video understanding with a wide range of applications. In the context of sports videos, it is integrated into most of the current solutions used by coaches, broadcasters and game specialists to assist in performance analysis, strategy development, and enhancing the viewing experience. This work presents an application study on temporal action localization for tennis broadcast videos. We study and evaluate a foundational video understanding model for identifying tennis actions in match footage. We explore its architecture, specifically the state space model, from video input to the prediction of temporal segments and classification labels. Our experiments provide findings and interpretations of the model’s performance on tennis data. We achieved an average mean Average Precision (mAP) of 66.14% over all thresholds on the TenniSet dataset, surpassing the other methods, and 96.16% on our private French Open dataset.