Visual and Domain Knowledge for Professional-level Graph-of-Thought Medical Reasoning

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
Abstract: Medical Visual Question Answering (MVQA) requires AI models to answer questions related to medical images, offering significant potential to assist medical professionals in evaluating and diagnosing diseases, thereby improving early interventions. However, existing MVQA datasets primarily focus on basic questions regarding visual perception and pattern recognition, without addressing the more complex questions that are critical in clinical diagnosis and decision-making. This paper introduces a new benchmark designed for professional-level medical reasoning, simulating the decision-making process. We achieve this by collecting MRI and clinical data related to Hypoxic-Ischemic Encephalopathy, enriched with expert annotations and insights. Building on this data, we generate clinical question-answer pairs and MRI interpretations to enable comprehensive diagnosis, interpretation, and prediction of neurocognitive outcomes. Our evaluation of current large vision-language models (LVLMs) shows limited performance on this benchmark, highlighting both the challenges of the task and the importance of this benchmark for advancing medical AI. Furthermore, we propose a novel ``Clinical Graph of Thoughts" model, which integrates domain-specific medical knowledge and clinical reasoning processes with the interpretive abilities of LVLMs. The model demonstrates promising results, achieving around 15\% absolute gain on the most important neurocognitive outcome task, while the benchmark still reveals substantial opportunities for further research innovation.
Lay Summary: This study focuses on making artificial intelligence (AI) better at helping doctors understand and diagnose brain injuries in newborns using MRI scans. Right now, most AI systems that answer questions about medical images are good at spotting simple patterns, but they struggle with the complex thinking doctors do when making real medical decisions. To tackle this, we created a new benchmark, using real MRI scans and expert knowledge about a neonatal brain condition called Hypoxic-Ischemic Encephalopathy (HIE). They built a set of medical questions and answers based on expert interpretations, aiming to mimic how doctors analyze MRI images and predict future brain development in affected infants. When we tested current advanced AI models on this challenge, the models didn’t perform well, showing that there’s still a long way to go. We also built a new AI model called the "Clinical Graph of Thoughts," which combines medical knowledge and clinical reasoning. This model did much better—improving prediction accuracy by about 15%—and shows promise for future tools that could support doctors in diagnosing and treating brain injuries more effectively. In short, this work takes a step toward smarter, more helpful AI tools in medicine to actually think like clinicians.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Health / Medicine
Keywords: Medical, Medical Reasoning
Submission Number: 5576
Loading