SPKGDiag: Learning Symptom-Linked Patient Knowledge Graphs via Multi-Hop Similarity Message Passing for Automatic Diagnosis

SPKGDiag: Learning Symptom-Linked Patient Knowledge Graphs via Multi-Hop Similarity Message Passing for Automatic Diagnosis

ICLR 2026 Conference Submission13375 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Automated Diagnosis, Graph Representation Learning, Patient-Centric Knowledge Graph

TL;DR: Learning Symptom-Linked Patient Knowledge Graphs via Multi-Hop Similarity Message Passing for Automatic Diagnosis

Abstract: Automated diagnostics in medicine leverage advanced algorithms to detect, analyze, and interpret medical conditions from data without human intervention. Existing systems predominantly focus on disease prediction, frequently neglecting the critical role of comprehensive symptom analysis. While some prior studies explored the reasoning capabilities of large language models (LLMs), they faced challenges in effectively integrating structured medical knowledge, limiting their ability to generate coherent and clinically relevant patient-centric representations. In this study, we propose \ours{}, a novel framework that combines symptom extraction with patient-centric knowledge graph construction to enhance the accuracy and efficiency of disease diagnosis. We leverage LLM to automatically extract both implicit and explicit symptoms from patient-doctor conversations and construct a patient-centric knowledge graph with semantic embeddings. A multi-hop neighborhood sampling approach is used to capture common clinical symptoms by modeling both local patient-specific patterns and global population-level insights. Furthermore, we propose to use a specialized Message Passing Neural Network (MPNN) to process this graph structure for diagnosis prediction, aiming to balance semantic richness with structural relevance through message aggregation and self-projection mechanisms. We conducted extensive experiments on four benchmark datasets (MZ-4, MZ-10, Dxy, and Synthetic), achieving improvements of 1.4\%, 4.4\%, 2.0\%, and 7.4\% over the best existing methods, including RL, transform-based, and multi-department systems, respectively. Our model exhibited robust performance compared to recent baselines on a large-scale in-house dataset. The proposed framework provides an interpretable solution that enhances symptom-driven automatic diagnosis by integrating efficient natural language processing with structured medical reasoning.

Primary Area: learning on graphs and other geometries & topologies

Submission Number: 13375

Loading