EyeGraphGPT: Knowledge Graph Enhanced Multimodal Large Language Model for Ophthalmic Report Generation

Zhirui Wang, Xinlong Jiang, Chenlong Gao, Fan Dong, Weiwei Dai, Bingyu Wang, Bingjie Yan, Qian Chen, Wuliang Huang, Teng Zhang, Yiqiang Chen

Published: 01 Jan 2024, Last Modified: 03 Sept 2025BIBM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Automatic generation of ophthalmic reports holds significant potential to lessen clinicians’ workload, enhance work efficiency, and alleviate the imbalance between clinicians and patients. Recent advancements in multimodal large language models, represented by GPT-4, have demonstrated remarkable performance in the general domain. However, training such models necessitates a substantial amount of paired image-text data, yet paired ophthalmic data is limited, and ophthalmic reports are laden with specialized terminologies, making it challenging to transfer the training paradigm to the ophthalmic domain. In this paper, we propose EyeGraphGPT, a knowledge graph enhanced multimodal large language model for ophthalmic report generation. Specifically, we construct a knowledge graph by leveraging the knowledge from a medical database and expertise from ophthalmic experts to model relationships among ophthalmic diseases, enhancing the model’s focus on key disease information. We then perform relation-aware modal alignment to incorporate knowledge graph features into visual features, and further enhance modality collaboration through visual instruction fine-tuning to adapt the model to the ophthalmic domain. Our experiments on a real-world dataset demonstrates that EyeGraphGPT outperforms previous state-of-the-art models, highlighting its superiority in scenarios with limited medical data and extensive specialized terminologies.