FRAGILE: Benchmarking Framing Sensitivity in High-Stakes Decision-Making

Seojin Hwang; Minju Kim; Junhyuk Choi; Hwanhee Lee

FRAGILE: Benchmarking Framing Sensitivity in High-Stakes Decision-Making

Seojin Hwang, Minju Kim, Junhyuk Choi, Hwanhee Lee

Published: 02 Jun 2026, Last Modified: 08 Jun 2026Pluralistic-Alignment 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Framing Sensitivity, Value Alignment

TL;DR: We show that fact-preserving framing consistently flips LLM decisions through three mechanistically distinct pathways, and prompt-based mitigation fails to suppress these effects, requiring representation-level alignment.

Abstract: Large language models (LLMs) are increasingly deployed in high-stakes decision-making settings such as legal reasoning, where consistency under factually equivalent inputs is critical. However, we find that semantically equivalent but differently framed inputs can significantly destabilize LLM decisions, even when all underlying facts are preserved. To systemically investigate this problem, we introduce FRAGILE, a large-scale benchmark spanning moral reasoning, medical triage, legal judgment, and role conflict that isolates fact-preserving semantic framing across three controlled dimensions: temporal slice, value-tinted narration, and narrative vividness. Our experiments reveal a high susceptibility to framing, with an average decision flip rate of 28.6% across diverse architectures. These flips consistently follow the framing’s intended direction, and internal representations at the decision token reflect concepts aligned with the applied frame—confirming that framing-induced context, rather than factual content alone, governs LLM decisions. Given this contextual dependency, we evaluate whether explicitly anchoring decisions to values at the prompt level can mitigate such sensitivity. We find that prompt-based value anchoring fails to reliably suppress framing effects, indicating that the governing mechanism resides deeper than the prompt surface. Consequently, effective mitigation necessitates representation-level alignment that targets the specific contextual pathways activated by each framing type.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 136

Loading