Anatomical Structure-Aware Image Difference Graph Learning for Difference-Aware Medical Visual Question Answering

Xinyue Hu; Lin Gu; Qingyu Chen; Liangchen Liu; Kazuma Kobayashi; Qiyuan An; Zhang Mengliang; Tatsuya Harada; Zhiyong Lu; Yingying Zhu

Anatomical Structure-Aware Image Difference Graph Learning for Difference-Aware Medical Visual Question Answering

Xinyue Hu, Lin Gu, Qingyu Chen, Liangchen Liu, Kazuma Kobayashi, Qiyuan An, Zhang Mengliang, Tatsuya Harada, Zhiyong Lu, Yingying Zhu

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: Chest Xray, Difference Image VQA, medical dataset, Graph Neuron Networks

Abstract: To contribute to automating the medical vision-language model, we propose a novel Chest-Xray Different Visual Question Answering (VQA) task. Given a pair of main and reference images, this task attempts to answer several questions on both diseases and more importantly, the differences between them. This is consistent with the radiologist's diagnosis practice that compares the current image with the reference before concluding the report. For this task, we propose a new dataset, namely MIMIC-Diff-VQA including 700,821 QA pairs on 109,872 pairs of images. Meanwhile, we also propose a novel expert knowledge-aware graph representation learning model to address this problem. We leveraged the expert knowledge such as anatomical structure prior, semantic and spatial knowledge to construct a multi-relationship graph to represent the image differences between two images for the image difference VQA task. Our dataset and code will be released upon publication. We believe this work would further push forward the medical vision language model.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)

TL;DR: Large scale image difference medical VQA dataset and expert knowledge-aware graph representation learning

15 Replies

Loading