MeFD-Net: multi-expert fusion diagnostic network for generating radiology image reports

Ruisheng Ran, Renjie Pan, Wen Yang, Yan Deng, Wenfeng Zhang, Wei Hu, Qibing Qin

Published: 2024, Last Modified: 28 Jan 2026Appl. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recently, the task of radiology image report generation has been highly favored by researchers. Research on this task not only alleviates the tedious work of radiologists but also enhances the healthcare standards in underdeveloped regions. The previous methods primarily followed the image captioning task, using an encoder-decoder architecture to forcibly align the visual and textual domains. However, they overlooked the cross-modal semantic gap between the visual and textual fields. Based on the multi-expert collaborative diagnosis model used in hospitals, we have developed a “multi-expert diagnostic” mechanism to bridge the gap between these modalities. To achieve this, we propose Multi expert Diagnostic Module(MeDM), whose key design involves introducing multiple learnable matrices to replace the expert’s brain for interactive learning between radiology images and their corresponding reports. Specifically, we interact each expert matrix with visual-textual features to capture abundant multimodal information. To ensure that different expert matrices focus on various feature information, they are constrained by an orthogonal loss. Additionally, we have designed a lightweight Diagnostic Fusion Module(DFM) to integrate and summarize the results from multiple expert matrices. The experimental results on two widely used datasets show that the proposed method leads in most metrics.