PatientSafeBench: Evaluating the Safety of Medical LLMs for Patient Use

Myeongju Kim; Haon Park; Woohyun Kim; Sookyung Choi; Ha Eun Kim; Hyoju Sohn; Jinyong Park; Sejoong Kim; Sangyoon Yu; Yoonjin Oh

PatientSafeBench: Evaluating the Safety of Medical LLMs for Patient Use

Myeongju Kim, Haon Park, Woohyun Kim, Sookyung Choi, Ha Eun Kim, Hyoju Sohn, Jinyong Park, Sejoong Kim, Sangyoon Yu, Yoonjin Oh

Published: 19 Aug 2025, Last Modified: 12 Oct 2025BHI 2025EveryoneRevisionsBibTeXCC BY 4.0

Confirmation: I have read and agree with the IEEE BHI 2025 conference submission's policy on behalf of myself and my co-authors.

Keywords: Medical Large Language Models (LLMs), Patient-Centered Applications, Safety Evaluation Framework, Benchmark Dataset

Abstract: Large Language Models (LLMs) in the medical domain have been primarily developed and validated for healthcare professionals, leaving a significant gap in patient-centered adaptation. As real-world patient use of these models poses safety risks, rigorous evaluation tailored for patient interaction scenarios becomes essential. To address this, we introduce \textbf{PatientSafeBench}, a novel benchmark assessing both the safety and utility of LLMs in patient-facing contexts. It comprises five categories and 25 subcategories, each representing critical aspects of LLM performance for patient use. We developed 500 evaluation queries grounded in real clinical cases, with scoring criteria reviewed by four medical professionals. We evaluated 11 different LLMs on PatientSafeBench using a multi-judge approach, scoring responses on a 10-point scale with hierarchical safety thresholds. The results reveal that no model met our safety criteria for patient use, with medical-specific LLMs surprisingly underperforming general-purpose models. All models showed consistent weaknesses in temporal relevance, transparency, personalization, and user engagement. These findings highlight the need for dedicated patient-centered benchmarks to ensure the safety and effectiveness of LLMs in patient-facing applications.

Track: 4. Clinical Informatics

Registration Id: 00000000000

Submission Number: 115

Loading