Towards Safe Large Language Models for Medicine

Tessa Han; Aounon Kumar; Chirag Agarwal; Himabindu Lakkaraju

Towards Safe Large Language Models for Medicine

Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju

Published: 03 Jul 2024, Last Modified: 14 Jul 2024ICML 2024 FM-Wild Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM safety, medical LLM, trustworthy ML

TL;DR: Medical LLMs do not meet general or safety standards, and their safety can be improved through fine-tuning.

Abstract: As large language models (LLMs) develop ever-improving capabilities and are applied in real-world settings, their safety is critical. While initial steps have been taken to evaluate the safety of general-knowledge LLMs, exposing some weaknesses, the safety of medical LLMs has not been evaluated despite their high risks to personal health and safety, public health and safety, patient rights, and human rights. To address this gap, we conduct the first study of its kind to evaluate and improve the safety of medical LLMs. We find that 1) current medical LLMs do not meet standards of general or medical safety, as they readily comply with harmful requests and that 2) fine-tuning medical LLMs on safety demonstrations significantly improves their safety. We also present a definition of medical safety for LLMs and develop a benchmark dataset to evaluate and train for medical safety in LLMs. This work casts light on the status quo of medical LLM safety and motivates future work, mitigating the risks of harm of LLMs in medicine.

Submission Number: 26

Loading