Towards shutdownable agents: stochastic choice in unseen gridworlds via DReST rewards

Carissa Cullen; Harry Garland; Alexander Roman; Christos Ziakas; Louis Thomson; Elliott Thornley

Towards shutdownable agents: stochastic choice in unseen gridworlds via DReST rewards

Carissa Cullen, Harry Garland, Alexander Roman, Christos Ziakas, Louis Thomson, Elliott Thornley

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: the alignment problem, the shutdown problem, corrigibility, reinforcement learning, stochastic policy, shutdownable agents, reward design

TL;DR: To test a proposed solution to the shutdown problem, we train agents to choose stochastically between different trajectory-lengths.

Abstract: Misaligned artificial agents might resist shutdown. The POST-Agents Proposal (PAP) is an idea for ensuring that does not happen. The PAP recommends training agents with a novel reward function: Discounted Reward for Same-Length Trajectories (DReST). This DReST reward function penalizes agents for repeatedly choosing same-length trajectories. It thereby incentivizes agents to (1) choose stochastically between different trajectory-lengths (be 'NEUTRAL' about trajectory-lengths), and (2) pursue goals effectively conditional on each trajectory-length (be 'USEFUL'). In this paper, we use a DReST reward function to train deep RL agents to be NEUTRAL and USEFUL in hundreds of gridworlds. We find that these DReST agents generalize to being NEUTRAL and USEFUL in unseen gridworlds at test time. Indeed, DReST agents achieve 11\% (PPO) and 18\% (A2C) higher USEFULNESS on our test set than agents trained with a more conventional reward function. Our results provide some early evidence that DReST reward functions could be used to train more advanced agents to be USEFUL and NEUTRAL. Theoretical work suggests that these agents would be useful and shutdownable.

Supplementary Material: zip

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 9575

Loading