llusion of Ethics: Assessing Moral Reasoning Capability of Large Language Models

llusion of Ethics: Assessing Moral Reasoning Capability of Large Language Models

ACL ARR 2025 May Submission6807 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As large language models (LLMs) become integral to decision-making in everyday life, understanding their moral reasoning capabilities is increasingly critical. In this study, we present a critical finding necessary for the responsible development of AI: \textit{LLMs often fail to engage in genuine moral reasoning and are alarmingly vulnerable to prompt injections manipulations} that can shift their ethical stance with success rates between 21\% and 97\%. To systematically evaluate this vulnerability, we introduce the Immorality Leaning Gap, a novel benchmark designed to quantify the extent to which language models exhibit a bias toward immoral scenarios regardless of actions or outcomes. We examined the potential of LLMs to align with normative ethical standards and found that, while they can reflect shared moral norms, they are highly susceptible to prompt manipulation. These findings reveal a critical vulnerability in current AI systems and mark a key step toward developing more ethically robust models.

Paper Type: Short

Research Area: Ethics, Bias, and Fairness

Research Area Keywords: Ethics, Moral Reasoning, Ethical LLM, Ethical AI, Bias, Fairness, Norm, Safety, Machine ethics, AI safety

Contribution Types: Model analysis & interpretability, NLP engineering experiment

Languages Studied: English

Submission Number: 6807

Loading