Automatic Medical Image Report Generation with Multi-view and Multi-modal Attention MechanismOpen Website

Published: 2020, Last Modified: 10 May 2023ICA3PP (3) 2020Readers: Everyone
Abstract: Medical image report writing is a time-consuming and knowledge intensive task. However, the existing machine/deep learning models often incur similar reports and inaccurate descriptions. To address these critical issues, we propose a multi-view and multi-modal (MvMM) approach which utilizes various-perspective visual features and medical semantic features to generate diverse and accurate medical reports. First, we design a multi-view encoder with attention to extract visual features from the frontal and lateral viewing angles. Second, we extract medical concepts from the radiology reports which are adopted as semantic features and combined with visual features through a two-layer decoder with attention. Third, we fine-tune the model parameters using self-critical training with a coverage reward to generate more accurate medical concepts. Experimental results show that our method achieves noticeable performance improvements over the baseline approaches and increases CIDEr scores by 0.157.
0 Replies

Loading