Restoring Trust in Medical LLMs: GNN-Powered Knowledge Graph Reconstruction for Robust Defense

Restoring Trust in Medical LLMs: GNN-Powered Knowledge Graph Reconstruction for Robust Defense

ICLR 2026 Conference Submission24966 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Medical large language models, Robust Defense, Data-poisoning Attacks

Abstract: Medical large language models (LLMs) have demonstrated remarkable capabilities in clinical decision support and biomedical question-answering, yet they remain highly vulnerable to adversarial threats such as prompt injection, data poisoning, and parameter tampering. As reported in Nature Medicine (2025), existing defense mechanisms based on static triple-form knowledge graphs (KGs) lack structural adaptability, making them ineffective against multi-hop reasoning attacks or semantic perturbations. To address this challenge, we propose a structure-aware KG reconstruction framework powered by graph neural networks (GNNs), which dynamically reweights relational edges, filters adversarial connections, and stabilizes semantic propagation while preserving triple compatibility. By incorporating relation-aware weighted triples, our method exhibits stronger adversarial robustness compared to conventional equal-weight KGs. The results show that our method can improve accuracy and other indicators by an average of 3\% on QA benchmarks compared to existing defense methods. In terms of drug recommendation ranking tasks, our method can balance accuracy and completeness. Our approach outperforms vanilla LLMs and existing defense methods, effectively restoring pre-attack performance and enabling trustworthy, robust medical LLM applications.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 24966

Loading