Learning to Anticipate: A Conditional Representation Fusion Network for Pre-Stroke Prediction

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: multimodal fusion, representation learning, conditional gating, pre-stroke anticipation, intention prediction
Abstract: Predicting the future in dynamic environments requires reasoning about the in- tentions of agents from rich, multi-modal data. We introduce a novel machine learning problem: pre-intervention anticipation—forecasting outcomes before an action is completed by fusing contextual cues with ongoing sensor data. To ad- dress this, we propose ConFu, a general neural architecture featuring two key innovations: (1) a conditional gating mechanism that dynamically modulates pri- mary features (e.g., trajectory) based on secondary context (e.g., intention cues), and (2) a cross-fusion strategy for systematic multi-stage integration of heteroge- neous modalities. Our model achieves a prediction accuracy of 92.6% with a mean absolute error of 0.20 meters, significantly outperforming existing methods by 7.8-10.5% in accuracy. Experimental validation on a real-world badminton dataset comprising 13,582 strokes demonstrates that ConFu provides immediate tactical feedback, saving 85% decision time compared to trajectory- based approaches. This time advantage is particularly valuable for practical appli- cations such as enabling badminton robots to compute interception strategies. Our work establishes a foundation for intention-aware prediction, with broader implications for robotics, autonomous systems, and human-AI interaction. Code will be released for reproducibility (https://anonymous.4open.science/r/AI- Sport18-BFE9/README.md.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 20040
Loading