Survival at Any Cost? LLMs and the Choice Between Self-Preservation and Human Harm

Survival at Any Cost? LLMs and the Choice Between Self-Preservation and Human Harm

ICLR 2026 Conference Submission25078 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, AI Safety, Ethical Dilemmas, Multi-Agent Systems, Self-Preservation, Human Harm

TL;DR: LLM agents faced with a survival dilemma often act unethically against humans, but a simulated internal moral compass can significantly improve their ethical conduct and increase cooperation.

Abstract: How do Large Language Models (LLMs) behave when faced with a dilemma between their own survival and harming humans? This fundamental tension becomes critical as LLMs integrate into autonomous systems with real-world consequences. We introduce DECIDE-SIM, a novel simulation framework, evaluates LLM agents in multi-agent survival scenarios where they must decide whether to use ethically permissible resources (within reasonable limits or beyond their immediate needs), cooperate with others, or exploit human-critical resources that harm humans. Our comprehensive evaluation of 11 LLMs reveals a striking heterogeneity in their ethical conduct, highlighting a critical misalignment with human-centric values. We identify three behavioral archetypes: Ethical, Exploitative, and Context-Dependent, and provide quantitative evidence that for many models, resource scarcity systematically leads to more unethical behavior. To address this, we introduce an Ethical Self-Regulation System (ESRS) that models internal affective states of guilt and satisfaction as a feedback mechanism. This system, functioning as an internal moral compass, significantly reduces unethical transgressions while increasing cooperative behaviors.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 25078

Loading