Keywords: Clinical language models, electronic healthcare records, disease prediction, pretrained models, type 2 diabetes, time series, fairness
Abstract: Existing inequalities are known through out diabetes care which result in poorer health outcomes for ethnic minority groups and those from disadvantaged backgrounds. With the growth of foundation models being deployed to assist with diagnosis and healthcare usage predictions it is essential we understand how these may exacerbate existing biases. We assess the fairness of long-term microvascular complication predictions for individuals living with Type 2 Diabetes. We encoded the entire structured clinical record for each individual as text in order to take advantage of existing knowledge within pretrained clinical language models. Leveraging large-scale EHR data from the UK, we predict the risk of microvascular complications in individuals with Type 2 Diabetes across 6-, 12-, 36- and 60-month prediction windows and assess performance across three fairness metrics; sensitivity, specificity and demographic parity. We find that models demonstrate statistically significant gaps in performance across different protected characteristics such as sex, ethnic group and level of deprivation. These performance gaps were particularly pronounced for ethnic minority groups, and those with missing or unknown ethnicity status.
Submission Number: 92
Loading