Data-Driven Early Prediction of Cerebral Palsy Using AutoML and interpretable kinematic features

Published: 12 Feb 2025, Last Modified: 23 May 2025medRxivEveryoneCC BY 4.0
Abstract: Early identi!cation of cerebral palsy (CP) remains a major challenge due to the reliance on expert assessments that are time-intensive and not scalable. Consequently, a range of studies have aimed at us- ing machine learning to predict CP scores based on motion tracking, e.g. from video data. These studies generally predict clinical scores which are a proxy for CP risk. However, clinicians do not REALLY want to estimate scores, they want to estimate the patients’ risk of developing clinical symptoms. Here we present a data-driven machine-learning (ML) pipeline that extracts movement features from infant video based motion tracking and estimates CP risk us- ing AutoML. Using AutoSklearn, our framework minimizes risk of over!tting by abstracting away researcher-driver hyperparameter optimization. Trained on movement data from 3- to 4-month-old infants, our classi!er predicts a highly indicative clinical score (Gen- eral Movements Assessment [GMA]) with an ROC-AUC of 0.78 on a held-out test set, indicating that kinematic movement features capture clinically relevant variability. Without retraining, the same model predicts the risk of cerebral palsy outcomes at later clinical follow-ups with an ROC-AUC of 0.74, demonstrating that early motor representations generalize to long-term neurodevelopmental risk. We employ pre-registered lock-box validation to ensure rig- orous performance evaluation. This study highlights the potential of AutoML-powered movement analytics for neurodevelopmental screening, demonstrating that data-driven feature extraction from movement trajectories can provide an interpretable and scalable approach to early risk assessment. By integrating pre-trained vi- sion transformers, AutoML-driven model selection, and rigorous validation protocols, this work advances the use of video-derived movement features for scalable, data-driven clinical assessment, demonstrating how computational methods based on readily avail- able data like infant videos can enhance early risk detection in neurodevelopmental disorders.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview