Learning to Be Cautious

Montaser Mohammedalamen; Dustin Morrill; Alexander Sieusahai; yash satsangi; Michael Bowling

Learning to Be Cautious

Montaser Mohammedalamen, Dustin Morrill, Alexander Sieusahai, yash satsangi, Michael Bowling

Published: 12 Oct 2025, Last Modified: 12 Oct 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: A key challenge in the field of reinforcement learning is to develop agents that behave cautiously in novel situations. It is generally impossible to anticipate all situations that an autonomous system may face or what behavior would best avoid bad outcomes. An agent that could learn to be cautious would overcome this challenge by discovering for itself when and how to behave cautiously. In contrast, current approaches typically embed task-specific safety information or explicitly cautious behaviors into the system, which is error-prone and imposes extra burdens on practitioners. In this paper, we present both a sequence of tasks where cautious behavior becomes increasingly non-obvious, as well as an algorithm to demonstrate that it is possible for a system to learn to be cautious. The essential features of our algorithm are that it characterizes reward function uncertainty without task-specific safety information and uses this uncertainty to construct a robust policy. Specifically, we construct robust policies with a $k$-of-$N$ counterfactual regret minimization (CFR) subroutine given a learned reward function uncertainty represented by a neural network ensemble belief. These policies exhibit caution in each of our tasks without any task-specific safety tuning. Our code is available at https://github.com/montaserFath/Learning-to-be-Cautious

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: 1. Add GitHub link 2. remove "blue" color text 2. rephrase the broader impact

Video: https://www.youtube.com/watch?v=t-pxmpCYHtA

Code: https://github.com/montaserFath/Learning-to-be-Cautious

Supplementary Material: zip

Assigned Action Editor: ~Dileep_Kalathil1

Submission Number: 4985

Loading