Explainable Machine Learning for Neonatal Screening: A Fast&Frugal Decision Tree for Rare Metabolic Disease Detection

Published: 01 Jan 2025, Last Modified: 31 Jul 2025AIME (1) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Neonatal metabolic screening is a cornerstone of preventive medicine, enabling the early detection and treatment of severe metabolic disorders. However, interpreting complex biomarker profiles and addressing data imbalances, particularly in the detection of rare diseases, pose significant challenges that require robust and interpretable diagnostic tools. Current systems, such as the Collaborative Laboratory Integrated System (CLIR), face limitations in data adaptability, interpretability, and performance across diverse populations. This study develops and evaluates a Fast and Frugal Decision Tree (FFT) model tailored to detect neonatal metabolic abnormalities. Leveraging data from Lombardy’s neonatal screening program, which includes 985,792 samples collected between 2012 and 2022, the FFT was trained on a stratified subsample of 100,198 observations to optimize computational efficiency and minimize false negatives. The model demonstrated strong diagnostic performance, achieving a PPV of 0.99, sensitivity of 0.85, F1 score of 0.91, and F2 score of 0.88, while proving robust against missing data and label imbalances. Critical biomarkers, such as phenylalanine and citrulline, were identified as significant contributors. These findings highlight the potential of integrating explainable machine learning models into neonatal screening workflows to enhance diagnostic accuracy, support early rare disease detection, and improve neonatal care. Future work should focus on expanding datasets and conducting prospective validation to improve sensitivity and generalizability further.
Loading