Learning to Anticipate: A Conditional Representation Fusion Network for Pre-Stroke Prediction

Learning to Anticipate: A Conditional Representation Fusion Network for Pre-Stroke Prediction

ICLR 2026 Conference Submission20040 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: multimodal fusion, representation learning, conditional gating, pre-stroke anticipation, intention prediction

Abstract: A crucial aspect of badminton is accurately predicting the shuttlecock's landing point. As a fast-paced sport, badminton demands agility and rapid strategic decision making, making quick and precise predictions essential. Existing methods are primarily dependent on post-stroke trajectories, neglecting the underlying player and shuttlecock dynamics that fundamentally determine the landing point. Here, we propose a novel multimodal predictive framework, Conditional Gate-Based Cross-Fusion Network (ConFu). ConFu integrates four key information streams--three-dimensional (3D) shuttlecock trajectory reconstruction from monocular video, player dynamic localization, keypoint-based arm gesture, and stroke types--into proposed conditional gate LSTM and spatio-temporal transformer modules. Our model achieves a prediction accuracy of 92.6% with a mean absolute error of 0.20 meters, significantly outperforming existing methods by 7.8-10.5% in accuracy. Experimental validation on a real-world badminton dataset comprising 13,582 strokes demonstrates that ConFu provides immediate tactical feedback, saving 85% decision time compared to trajectory-based approaches. This time advantage is particularly valuable for practical applications such as enabling badminton robots to compute interception strategies. Our work establishes a foundation for intention-aware predictive models in dynamic sports environments, with broader implications for embodied AI and real-time human–machine interaction. Source code is available below: https://anonymous.4open.science/r/AI-Sport18-BFE9.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 20040

Loading