Abstract: In this paper, we present a new framework that combines
deep semantic segmentation with homography estimation to address
challenges in racket sports court registration from broadcast videos. In
particular, we deal with courts presenting the following problems: (a)
brushed and occluded lines, (b) illumination variations, and (c) unknown
camera parameters. Given an input frame from a broadcast video, our
approach employs an encoder-decoder deep neural network to predict
a precise pixel-level segmentation mask, which is then used to estimate
the homography matrix between the input frame and its reference court
model. For a comprehensive evaluation, we have developed two datasets
for badminton and tennis that meet our specific needs. Since datasets
and state-of-the-art methods with code are not publicly available, we
compared our framework with a commonly handcrafted approach largely
used as a baseline method in racket sports analysis. We show that our
method outperforms the baseline in terms of registration accuracy and
inference latency per frame.
Loading