Faithfulness Hallucination Detection in Healthcare AI

Prathiksha Rumale Vishwanath; Simran Tiwari; Tejas Ganesh Naik; Sahil Gupta; Dung Ngoc Thai; Wenlong Zhao; SUNJAE KWON; Victor Ardulov; Karim Tarabishy; Andrew McCallum; Wael Salloum

Faithfulness Hallucination Detection in Healthcare AI

Prathiksha Rumale Vishwanath, Simran Tiwari, Tejas Ganesh Naik, Sahil Gupta, Dung Ngoc Thai, Wenlong Zhao, SUNJAE KWON, Victor Ardulov, Karim Tarabishy, Andrew McCallum, Wael Salloum

Published: 29 Jun 2024, Last Modified: 30 Jul 2024KDD-AIDSH 2024 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI in Healthcare, Hallucination Detection, Generative Language Models, Natural Language Processing

Abstract: Faithfulness hallucinations, where AI-generated contents diverge from input contexts, pose significant risks in high-stakes environments like healthcare. In clinical settings, the reliability of systems is crucial, as any deviations can lead to misdiagnoses and inappropriate treatments. The burden of summarizing lengthy electronic health records (EHRs) is substantial for clinicians, who face the challenge of extracting key information from extensive notes. Leveraging large language models (LLMs) to summarize these documents can aid clinicians by providing concise and accurate summaries. This study investigates faithfulness hallucinations in medical record summaries generated by LLMs such as GPT-4o and Llama-3. Our detection framework, developed in collaboration with clinicians and supported by a web-based annotation tool, categorizes 5 types of medical event hallucinations. A pilot study involving 100 summaries of medical notes reveals the presence of our categorized hallucinations by recent closed-source and open-source LLMs. The findings highlight the necessity for robust hallucination detection methods to ensure reliability of AI applications in healthcare, ultimately enhancing clinical workflows and improving patient care.

Submission Number: 40

Loading