A Survey on Unifying Large Language Models and Knowledge Graphs for Biomedicine and Healthcare

Ran Xu, Patrick Jiang, Linhao Luo, Cao Xiao, Adam Cross, Shirui Pan, Jimeng Sun, Carl Yang

Published: 03 Aug 2025, Last Modified: 26 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: In recent years, the landscape of digital biomedicine and healthcare has been reshaped due to the disruptive breakthroughs in AI-facilitated by tremendous data and high-performance computers, large language models (LLMs) have transformed information technology from accessing data to performing analytical tasks. While demonstrating unprecedented capabilities, LLMs have been found unreliable in tasks requiring factual knowledge and rigorous reasoning. Biomedicine and healthcare, as an important vertical domain rapidly benefitting from progress in AI, necessitates strict requirements on the accuracy, controllability, and interpretability of analytical models, posing critical challenges for LLMs. Despite recent studies addressing the hallucination problem of LLMs, research on empowering LLMs with the ability to plan, reason, and ground with explicit knowledge has also started to prosper, especially in the biomedicine and healthcare domain. On the other hand, biomedical data are enormous and notoriously complex, coming from various sources (e.g., biomedical knowledge bases, online literature, and hospitals) and bearing various modalities (e.g., tables, texts, images and time-series). Healthcare professionals have spent decades collecting, cleaning, and curating various types of data. The processes are extremely costly, producing various datasets with different data schemas, coding systems, and quality standards, many privately owned by the creators, making their integrative analysis and utilization through unified AI techniques still rather challenging. The generalizability of LLMs across different types of data endow them strong promises in automating the processing of large-scale complex healthcare data such as into unified knowledge graphs (KGs). Our goal in this survey is to systematically investigate and summarize recent studies on the unification of LLMs and KGs, towards fully utilizing the value of complex data, unleashing the power of generative AI, and expediting next-generation AI for biomedicine and healthcare applications.

External IDs:doi:10.1145/3711896.3736556