MedELBench: Evaluating Large Language Models on Medical Ethics and Laws in Complex Medical Dilemma Scenarios

MedELBench: Evaluating Large Language Models on Medical Ethics and Laws in Complex Medical Dilemma Scenarios

ACL ARR 2026 January Submission6680 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Medical ethics; Large language models; Benchmark

Abstract: Large language models are increasingly applied in medicine, but their use raises significant ethical and legal concerns. Prior evaluations typically reduced medical‑ethics problems to one‑ or two‑sentence vignettes, omitting the rich context necessary to assess model performance in realistic settings. We formalize the notion of complex medical ethics scenarios. Specifically, we divide scenarios into basic factors and extraneous factors. Basic factors are the main basis for judging whether a decision is ethical; extraneous factors usually interfere with the judgment of doctors or experts. Then, we formulate the basic elements in the conjunction normal form of first-order logic. Building on this formulation, we introduce MedELBench, a benchmark designed to evaluate large language models on ethically and legally nuanced medical cases. Our experiments show that MedELBench poses substantial challenges for current systems. By releasing this dataset, we provide a systematic framework for studying LLM behavior in complex medical‑ethics contexts and lay the groundwork for future advances in safe, responsible clinical models. The dataset will be made public later.

Paper Type: Long

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: APP: Humanities & Computational Social Science

Languages Studied: English

Submission Number: 6680

Loading