Cricket Scene Analysis Using the RetinaNet Architecture

Tevin Moodley, Dustin van der Haar

Published: 01 Jan 2021, Last Modified: 05 Nov 2025CrossrefEveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The increased challenges surrounding object detection within the sporting environment have been studied with multiple proposed solutions for various domains. The increasing environmental changes with moving objects, actors, and overlapping objects that are present in sporting video footage make detecting and classifying different objects challenging. However, with the introduction of deep learning, researchers now have the available methods that can learn semantic, high level, and deeper features that can be used to solve problem areas within existing research. Cricket is a sporting domain that exhibits many of these challenges with multiple moving actors and objects. This research paper implements RetinaNet architecture to detect and classify multiple objects within a scene. Six different objects/classes are addressed: fielder, batsman, non-striker, bowler, umpire, ball, and wicket-keeper. Following the dataset preparation, using transfer learning, the images are trained on the RetinaNet architecture, and the architecture proved to be successful by producing a mean average precision score of 86.78%. The trained model manages class precision scores all above 98% except that of the ball class. Upon further investigation, the poor performance of the ball class is due to occlusion and the ball’s small size relative to the overall frame. The proposed model can successfully detect and classify the different objects/classes within a cricket scene and serves as a promising foundation for further research within the cricketing domain.

External IDs:doi:10.1007/978-3-030-93420-0_19