Keywords: Large Language Models, Causal Bayesian Networks
TL;DR: We evaluate LLMs for causal reasoning in two medical domains.
Abstract: Large Language Models (LLMs) are increasingly being used for medical advice by patients and healthcare providers. These models capture knowledge from their training data, which consists of vast medical corpora. However, they lack the ability to use this knowledge to causally reason about the underlying physiological processes. Moreover, they are unable to deal with uncertainty, generating responses that are confidently presented yet factually incorrect. Acting on such factually incorrect medical advice can be dangerous. Mitigating these risks requires rethinking the role of LLMs in medicine. In this work, we present an evaluation scheme for LLMs in three roles: direct clinical decision support, exact medical knowledge base, and approximate medical knowledge base. We evaluate six LLMs on two clinical studies, in obstetrics and pediatric critical care, respectively. Our results indicate that LLMs are much better suited to the approximate knowledge base role. Based on these observations, we request caution when directly employing LLMs in safety-critical domains such as medicine.
Paper Type: New Full Paper
Submission Number: 21
Loading