Goal Misgeneralization as Implicit Goal Conditioning
Confirmation: I have read and confirm that at least one author will be attending the workshop in person if the submission is accepted
Keywords: goal misgeneralization, reinforcement learning
TL;DR: Goal misgeneralization is often due to ambiguity in training. We explore what causes particular goals to be followed.
Abstract: While many examples of goal misspecification have been dissected in the reinforcement learning literature, few works have focused on the relatively new goal misgeneralization. As goal misgeneralization often stems from underspecification, we explore a simple environment with some goals specifiable through explicit conditioning, and others not. We find that agents generally pursue a mixture of possible goals, and the choice of goal to pursue is often inexplicable. Nonetheless, we attempt an explanation of implicit goal conditioning -- wherein subtle environment features determine which goal is pursued -- and aim to understand which features induce pursuit of one goal over another.
Submission Number: 39