EyeGraphGPT: Knowledge Graph Enhanced Multimodal Large Language Model for Ophthalmic Report Generation
Abstract: Automatic generation of ophthalmic reports holds significant potential to lessen clinicians’ workload, enhance work efficiency, and alleviate the imbalance between clinicians and patients. Recent advancements in multimodal large language models, represented by GPT-4, have demonstrated remarkable performance in the general domain. However, training such models necessitates a substantial amount of paired image-text data, yet paired ophthalmic data is limited, and ophthalmic reports are laden with specialized terminologies, making it challenging to transfer the training paradigm to the ophthalmic domain. In this paper, we propose EyeGraphGPT, a knowledge graph enhanced multimodal large language model for ophthalmic report generation. Specifically, we construct a knowledge graph by leveraging the knowledge from a medical database and expertise from ophthalmic experts to model relationships among ophthalmic diseases, enhancing the model’s focus on key disease information. We then perform relation-aware modal alignment to incorporate knowledge graph features into visual features, and further enhance modality collaboration through visual instruction fine-tuning to adapt the model to the ophthalmic domain. Our experiments on a real-world dataset demonstrates that EyeGraphGPT outperforms previous state-of-the-art models, highlighting its superiority in scenarios with limited medical data and extensive specialized terminologies.
Loading