Access to Care and its Implications for EHR Reliability and Clinical Risk Prediction Model Performance

Published: 01 Nov 2024, Last Modified: 16 Jan 2025ML4HEveryoneCC BY 4.0
Abstract: Disparities in access to healthcare have been well-documented in the United States, but their effects on electronic health record (EHR) data reliability and resulting clinical models is poorly understood. Using an All of Us dataset of 134,513 participants, we investigate the effects of access to care on the medical machine learning pipeline, including medical condition rates, data quality, outcome label accuracy, and prediction performance. Our findings reveal that patients with lower access to care have worse EHR reliability as measured by patient self-reported conditions for 78% of examined medical conditions. In two medical settings, we demonstrate that clinical risk predictive performance can be worse for patients with lower access to care with AUC gaps of 3-7%. We develop solutions to mitigate these disparities and find that including patient self-reported conditions improves performance for patients with lower access to care with up to 4.5% higher AUC. More broadly, our results illustrate that access to care may have significant effects on medical data and model development.
Loading