Abstract: The diagnosis and severity assessment of Alzheimer’s disease (AD) is a complex procedure that requires clinicians to perform a comprehensive review of multiple factors, including biomarkers, physical and neurological examinations, and other relevant data. This suggests that the performance of machine learning algorithms for AD evaluation might exhibit limited performance if they rely on single or limited information. Recently, with advancements in deep learning, several studies have been conducted to assess AD severity by combining MRI and a few tabular-form data. However, no study tried to integrate imaging data with language-form of information. Recent progresses in large language models and multimodal approaches have opened a new window for integrated data processing of vision and language information. In this study, we experimented the use of these multimodal capabilities for the evaluation of AD severity via clinical dementia rating (CDR). For the inputs of the multimodal neural network, both sentences containing various clinical information (e.g., neurological exam results, demographic and diagnostic information) and 3D MRI image were utilized. The text information was generated from tabular data using GPT-4o to mimic the condition in which natural language information is available. The results demonstrated that information embedded in sentences was effectively integrated with MRI information, producing statistically significant performance gains (9 out of 19 input conditions) when compared to text-only input. This work demonstrated potentials of natural language information and imaging data as the synergistic inputs for disease evaluation.
External IDs:doi:10.1109/access.2025.3624215
Loading