OEMLLM: Ophthalmology Expert MLLM for Various Fundus Disease Assisted Diagnosis

Published: 2025, Last Modified: 11 Feb 2026ICTAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: While Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in general-domain tasks, their application in specialized medical fields, such as the assisted diagnosis of fundus diseases, remains limited due to the lack of domain-specific knowledge. To bridge this gap, we introduce the FUNDUS-BENCH dataset, a multi-task benchmark tailored for fundus images. Based on the FUNDUSBENCH dataset, a multimodal medical auxiliary diagnosis system, Ophthalmology Expert MLLM (OEMLLM) is designed, which is an innovative system that leverages a hierarchical feature extraction method based on Vision Transformer to fully utilize both low-level lesion features and high-level semantic features from fundus images. OEMLLM further integrates with a Large Language Model (LLM) to perform multi-task learning for comprehensive fundus disease diagnosis. Extensive experiments show that OEMLLM outperforms state-of-the-art MLLMs with comparable parameter scales (approximately 2B parameters) and maintains competitive performance against larger-scale models. The dataset and code associated with this system will be open-sourced shortly, aiming to facilitate research and development of practical AI-assisted diagnostic tools in medical applications.
Loading