Learning to Be Cautious

TMLR Paper4985 Authors

28 May 2025 (modified: 05 Aug 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations. It is generally impossible to anticipate all situations that an autonomous system may face or what behavior would best avoid bad outcomes. An agent that could learn to be cautious would overcome this challenge by discovering for itself when and how to behave cautiously. In contrast, current approaches typically embed task-specific safety information or explicitly cautious behaviors into the system, which is error-prone and imposes extra burdens on practitioners. In this paper, we present both a sequence of tasks where cautious behavior becomes increasingly non-obvious, as well as an algorithm to demonstrate that it is possible for a system to learn to be cautious. The essential features of our algorithm are that it characterizes reward function uncertainty without task-specific safety information and uses this uncertainty to construct a robust policy. Specifically, we construct robust policies with a $k$-of-$N$ counterfactual regret minimization (CFR) subroutine given a learned reward function uncertainty represented by a neural network ensemble belief. These policies exhibit caution in each of our tasks without any task-specific safety tuning.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: 1. The **pseudo-code** of our method was moved from the appendix to the main body of the paper. 2. **Four Safe RL–related works** were added to the related work section in Appendix A, and their similarities and key differences from our method were highlighted. 3. Figure 2 was updated by **adding a violin plot** to show the distribution across multiple runs. 4. **The appendix was reorganized** into distinct sections for experiments and proofs, improving clarity and structure. 5. **A sensitivity study of the "help" reward** was conducted for both the "learning to ask for help" and "ask for help only when it is available" tasks, and the results were added to Appendix B.1.3. 6. **A sentence was added to the broader impact statement** to note the potential risk of adversaries manipulating uncertainty, possibly leading to excessive conservatism. 7. A clarification was added in the experimental section **explaining why Bayesian RL, POMDPs, and distributional RL baselines were not included**, due to the fully observable and deterministic nature of our environments. 8. A statement was added to Section 3 (Inference) explaining that the **ensemble model's computational cost** scales with $N$ and the number of $k$-of-$𝑁$ iterations, rather than with the size of the state space.
Assigned Action Editor: ~Dileep_Kalathil1
Submission Number: 4985
Loading