Keywords: Zero shot RL; Inverse inverse planning; Cognitive agent
TL;DR: “We leverage Inverse Inverse Planning (IIP) to enhance zero-shot reinforcement learning models, enabling interpretable and controllable generalization.
Abstract: Zero-shot reinforcement learning (ZSRL) trains agents to solve tasks that are not explicitly encountered during training. While recent ZSRL methods demonstrate impressive generalization capabilities, the interpretability of their zero-shot behaviors remains largely unaddressed. This poses a challenge for real-world deployment in safety-critical domains such as autonomous driving and assistive robotics. In this paper, we propose a novel integration of \textit{Inverse Inverse Planning} (IIP)—a behavior modification technique inspired by narrative analogies in storytelling—into the ZSRL setting. Our approach enables users to remove specific task-level intentions from a zero-shot policy without additional retraining. The result is a modified agent whose behavior is easier to inspect, explain, and control. We demonstrate that IIP can selectively suppress undesired behaviors in new tasks while preserving performance on the original task, offering a new direction for interpretable and controllable generalization in ZSRL.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 7653
Loading