Enhancing Alzheimer’s Disease Diagnosis Records with Large Language Models: A Pipeline for Multimodal and Longitudinal EHRs

Enhancing Alzheimer’s Disease Diagnosis Records with Large Language Models: A Pipeline for Multimodal and Longitudinal EHRs

ACL ARR 2024 June Submission3110 Authors

15 Jun 2024 (modified: 10 Jul 2024)ACL ARR 2024 June SubmissionEveryone, Ethics ReviewersRevisionsBibTeXCC BY 4.0

Abstract: Alzheimer's disease (AD) is a neurodegenerative, incurable condition and a leading cause of morbidity among individuals over 65 in the US. The screening and early diagnosis of AD condition is usually based on the patient's electrical health records (HERs), including clinical observations, cognitive tests, patient profiles, and medical-imaging-aided diagnoses. However, above information for researchers is highly fragmented. One of the most critical clinical diagnostic notes is stored in structured tables using specialized terminological formats. This presents significant challenges to the accessibility and readability for non-experts, thereby hindering information processing and the development of general medical AI systems. This work proposes a novel pipeline for processing AD clinical diagnostic information: (1) we collect clinical data from the largest AD dataset of Alzheimer's Disease Neuroimaging Initiative (ADNI), explain abbreviations and terminology, and organize the information in an accessible manner for those without expert knowledge of AD. (2) Leveraging the power of Large Language Models (LLMs), we present a GPT-based method that effectively transforms tabular clinical data into fluent and faithful natural language diagnostic reports, as demonstrated by our experimental results. (3) We further explore the inherently multi-modal nature of medical data, collecting and processing a total of 10387 volumetric T1-weighted MRI scans from ADNI. (4) Finally, we discuss the existing limitations in applying multimodality EHRs for brain disease analysis and propose forward-looking directions to meet the demands of the neuroimaging domain. We expect that this work will provide new insights into the neuroimaging domain and improve AI applications in healthcare.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: healthcare applications, cross-modal application, multimodality, NLP datasets

Contribution Types: Data resources, Data analysis

Languages Studied: English

Submission Number: 3110

Loading