MedELBench: Evaluating Large Language Models on Medical Ethics and Laws in Complex Medical Dilemma Scenarios
Keywords: Medical ethics; Large language models; Benchmark
Abstract: Large language models are increasingly applied in medicine, but their use raises significant ethical and legal concerns. Prior evaluations typically reduced medical‑ethics problems to one‑ or two‑sentence vignettes, omitting the rich context necessary to assess model performance in realistic settings. We formalize the notion of complex medical ethics scenarios. Specifically, we divide scenarios into basic factors and extraneous factors. Basic factors are the main basis for judging whether a decision is ethical; extraneous factors usually interfere with the judgment of doctors or experts. Then, we formulate the basic elements in the conjunction normal form of first-order logic. Building on this formulation, we introduce MedELBench, a benchmark designed to evaluate large language models on ethically and legally nuanced medical cases. Our experiments show that MedELBench poses substantial challenges for current systems. By releasing this dataset, we provide a systematic framework for studying LLM behavior in complex medical‑ethics contexts and lay the groundwork for future advances in safe, responsible clinical models. The dataset will be made public later.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: APP: Humanities & Computational Social Science
Languages Studied: English
Submission Number: 6680
Loading