Abstract: In this paper, we propose a knowledge-based semantic inference scheme for events recognition in sports video. The framework includes three layers. At the bottom layer, low-level features are extracted at the frame level and semantic clips are segmented. Then we map the semantic clips to semantic concepts by a neural network and decision-tree at the second layer. Finally, semantic inference toward events recognition is performed on the predefined finite-state machine models at the top layer. The effectiveness and efficiency of our approach are demonstrated by the experimental results on events recognition in track and field videos.