From Perception to Reasoning: Image-Grounded Knowledge Graphs for Topology-Aware Medical Analysis in Abdominal CT

Published: 26 Apr 2026, Last Modified: 26 Apr 2026Med-Reasoner 2026 PosterEveryoneRevisionsCC BY 4.0
Keywords: Vision-Language Models, Medical Image Analysis, Knowledge Graph Reasoning, Neuro-Symbolic AI, Topology-Aware Reasoning, Medical Vision-Language Systems, Abdominal CT Analysis, Explainable Medical AI
TL;DR: We introduce image-grounded Vision Knowledge Graphs that enable vision-language medical systems to perform topology-aware anatomical reasoning beyond embedding similarity.
Abstract: Vision-language foundation models achieve strong performance in medical image understanding, yet their decision-making largely relies on embedding similarity and lacks explicit reasoning over anatomical structure. In abdominal oncology, clinical decisions depend on structured relational factors—such as tumor burden, lesion multiplicity, vascular proximity, and cross-organ topology—that are not explicitly represented in voxel-level predictions or multimodal embeddings, limiting interpretability and compositional clinical reasoning. We propose an image-grounded, lesion-centric \textit{Vision Knowledge Graph} (VKG) framework that elevates segmentation outputs into structured relational representations encoding anatomical containment, spatial proximity, organ topology, and quantitative tumor burden. By integrating a symbolic anatomical ontology with learned graph representations, VKG establishes a neuro-symbolic post-perception reasoning layer complementary to vision and vision-language foundation models. This design enables explicit topology-aware constraint verification, interpretable multi-hop reasoning, and anatomically grounded inference derived directly from imaging evidence. VKG supports three reasoning capabilities: (i) compositional multi-constraint retrieval; (ii) anatomically grounded risk stratification aligned with oncologic staging factors; and (iii) interpretable evidence-path generation linking predictions to structured anatomical context. Across three abdominal CT cohorts (LiTS, Pancreas Tumor CT, and FLARE) and controlled capability tiers ranging from tabular attributes to multimodal and graph representations, VKG consistently improves topology-aware retrieval (nDCG@10), anatomical risk prediction (AUROC), and cross-dataset generalization. Gains are most pronounced in structurally complex, multi-constraint scenarios, demonstrating that embedding similarity alone is insufficient for clinically meaningful reasoning. These results position image-grounded knowledge graphs as a practical reasoning architecture that bridges perception and interpretable clinical inference, advancing medical vision-language systems toward structured, explainable, and clinically aligned decision support.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 19
Loading