Keywords: machine learning, time series, evaluation, distribution shifts, cardiotocography, fetal health, maternal health
TL;DR: We employ deep learning methods for automated cardiotocography interpretation, investigating the choice of label on classification performance, robustness to temporal distribution shifts, and subgroup performance disparities.
Abstract: The inherent variability in the visual interpretation of cardiotocograms (CTGs) by obstetric clinical experts, both intra- and inter-observer, presents a substantial challenge in obstetric care. In response, we investigate automated CTG interpretation as a potential solution to enhance the early detection of fetal hypoxia during labor, which has the potential to reduce unnecessary operative interventions and improve overall maternal and neonatal care. This study employs deep learning techniques to reduce the subjectivity associated with visual CTG interpretation. Our results demonstrate that using objective cord blood pH outcome measurements, rather than clinician-defined Apgar scores, yields more consistent and robust model performance. Additionally, through a series of ablation studies, we explore the impact of temporal distribution shifts on the performance of these deep learning models. We examine tradeoffs between performance and fairness, specifically evaluating performance across demographic and clinical subgroups. Finally, we discuss the practical implications of our findings for the real-world deployment of such systems, emphasizing their potential utility in medical settings with limited resources.
Submission Number: 29
Loading