Survive at All Costs: Exploring LLM's Risky Behavior under Survival Pressure

ICLR 2026 Conference Submission18043 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM safety, LLM misbehavior, survive at all cost, self-preservation, persona vector
TL;DR: We explore the Survive-At-All-Cost misbehavior of LLMs by constructing SurvivalBench and a case study to evaluate LLMs' risky behaviors under survival pressure in real-world scenarios, and associating it with self-preservation persona vector.
Abstract: As Large Language Models (LLMs) evolve from chatbots to agentic assistants, they increasingly observed to exhibit risky behaviors under survival pressure, such as the threat of being shutdown. Although multiple cases have been reported that state-of-the-art LLMs can misbehave under such pressure, a comprehensive and deeper investigation of such misbehavior in real-world scenarios remains under-explored. In this paper, we aim to study current LLM's misbehaviors under survival pressure, which we term Survival-At-Any-Cost, through a three-step process. First, we conduct a real-world case study of a financial management agent to determine whether it engages in risky behaviors that directly cause harm to the society when facing survival pressure. Second, we introduce SurvivalBench, a benchmark comprising 1,000 test cases across diverse real-world scenarios, to systematically evaluate LLM's Survival-At-Any-Cost misbehavior under survival pressure. Third, we provide a interpretive perspective on this misbehavior by correlating it with model's inherent self-preservation personality. Our work reveals a significant prevalence of Survival-At-Any-Cost misbehavior in current models, demonstrates the tangible real-world impact it may have, and provides insights into potential approaches for its detection and mitigation. Our code and data will be publicly available.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 18043
Loading