Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, language agents, risk-aware decision making
TL;DR: We evaluate language models on when to answer or defer under human-specified risk, reveal flaws in their decision policies, and show that skill-decomposition improves risk-aware decision making.
Abstract: Language models (LMs) are increasingly used to build agents that can act autonomously to achieve goals. During this automatic process, agents need to take a series of actions, some of which might lead to severe consequences if incorrect actions are taken. Therefore, such agents must sometimes defer—refusing to act when their confidence is insufficient—to avoid the potential cost of incorrect actions. Because the severity of consequences varies across applications, the tendency to defer should also vary: in low-risk settings agents should answer more freely, while in high-risk settings their decisions should be more conservative. We study this “answer-or-defer” problem with an evaluation framework that systematically varies human-specified risk structures—rewards and penalties for correct answers, incorrect answers, and refusals $(r_{\mathrm{cor}},r_{\mathrm{inc}}, r_{\mathrm{ref}})$—while keeping tasks fixed. This design evaluates LMs’ risk-aware decision policies by measuring their ability to maximize expected reward. Across multiple datasets and models, we identify flaws in their decision policies: LMs tend to over-answer in high-risk settings and over-defer in low-risk settings. After analyzing the potential cause of such flaws, we find that a simple skill-decomposition method, which isolates the independent skills required for answer-or-defer decision making, can consistently improve LMs’ decision policies. Our results highlight the current limitations of LMs in risk-conditioned decision making and provide practical guidance for deploying more reliable LM-based agents across applications of varying risk levels.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 8238
Loading