Language models are susceptible to incorrect patient self-diagnosis in medical applications

Published: 27 Oct 2023, Last Modified: 11 Nov 2023DGM4H NeurIPS 2023 PosterEveryoneRevisionsBibTeX
Keywords: Medicine, Large language models, ML for Healthcare, Deep Generative Models, Bias
TL;DR: In this work that we show that when a patient introduces biased clinical information the diagnostic accuracy of large language models (LLMs) decreases significantly.
Abstract: Large language models (LLMs) are becoming increasingly relevant as a potential tool for healthcare, aiding communication between clinicians, researchers, and patients. However, traditional evaluations of LLMs on medical exam questions do not reflect the complexity of real patient-doctor interactions. An example of this complexity is the introduction of patient self-diagnosis, where a patient attempts to diagnose their own medical conditions from various sources. While the patient sometimes arrives at an accurate conclusion, they more often are led toward misdiagnosis due to the patient's over-emphasis on bias validating information. In this work we present a variety of LLMs with multiple-choice questions from United States medical board exams which are modified to include self-diagnostic reports from patients. Our findings highlight that when a patient proposes incorrect bias-validating information, the diagnostic accuracy of LLMs drop dramatically, revealing a high susceptibility to errors in self-diagnosis.
Submission Number: 13