Knowledge-Enhanced Medical Visual Question Answering: A Survey (Invited Talk Summary)

Haofen Wang, Huifang Du

Published: 2022, Last Modified: 16 Jun 2024APWeb/WAIM Workshops 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Medical Visual Question Answering (Med-VQA) is a task in the field of Artificial Intelligence where a medical image is given with a related question, and the task is to provide an accurate answer to the question. It involves the integration of computer vision, natural language processing, and medical domain knowledge. Furthermore, incorporating medical knowledge in Med-VQA can improve the reasoning ability and accuracy of the answers. While knowledge-enhanced Visual Question Answering (VQA) in the general domain has been widely researched, medical VQA requires further examination due to its unique features. In the paper, we gather information on and analyze the current publicly accessible Med-VQA datasets with external knowledge. We also critically review the key technologies combined with knowledge in Med-VQA tasks in terms of the advancements and limitations. Finally, we discuss the existing challenges and future directions for Med-VQA.