Keywords: Large Language Models, Structured Data, Clinical Notes, Clinical Decision Support
TL;DR: This paper shows that large language models fine-tuned on structured clinical data can accurately predict cardiovascular disease risk and generalize well to unstructured clinical notes at inference time.
Abstract: Clinical machine learning models are typically trained on highly structured and consistent datasets but deployed in real-world settings dominated by unstructured clinical text, creating a fundamental challenge for practical adoption. In this work, we investigate whether large language models (LLMs), fine-tuned on structured patient data, can generalize effectively to unstructured clinical notes at inference time. Using the UK Biobank dataset for cardiovascular disease (CVD) risk prediction, we demonstrate that LLMs trained on structured representations achieve performance comparable to specialized tabular machine learning models. More importantly, we show that these models maintain strong predictive accuracy when applied to unstructured inputs, such as clinical notes, in both zero-shot and few-shot scenarios.
Submission Number: 84
Loading