Consistency Conditioned Memory Augmented Dynamic Diagnosis Model for Medical Visual Question Answering
Abstract: Medical Visual Question Answering (Med-VQA) holds immense promise as an invaluable medical assistance aid, offering timely diagnostic outcomes based on medical images and accompanying questions, thereby supporting medical professionals in making accurate clinical decisions. However, Med-VQA is still in its infancy, with existing solutions falling short in imitating human diagnostic processes and ensuring result consistency. To address these challenges, we propose a Consistency Conditioned Memory augmented Dynamic diagnosis model (CoCoMeD), incorporating two core components: a dynamic memory diagnosis engine and a consistency-conditioned enforcer. The dynamic memory diagnosis engine enables intricate diagnostic interactions by retaining vital visual cues from medical images and iteratively updating pertinent memories. This dynamic reasoning capability mirrors the cognitive processes observed in skilled medical diagnosticians, thus effectively enhancing the model's ability to reason over diverse medical visual facts and patient-specific questions. Moreover, to strengthen diagnostic coherence, the consistency-conditioned enforcer imposes coherence constraints linking interrelated questions with identical medical facts, ensuring the credibility and reliability of its diagnostic outcomes. Additionally, we present C-SLAKE, an extended Med-VQA dataset encompassing diverse medical image types, and categorized diagnostic question-answer pairs for consistent Med-VQA evaluation on rich medical sources. Comprehensive experiments on DME and C-SLAKE showcase CoCoMeD's superior performance and potential to advance trustworthy multi-source medical question answering.
Loading