Did I do that? Blame as a means to identify controlled effects in reinforcement learning

Oriol Corcoll; Youssef Sherif Mansour Mohamed; Raul Vicente

Did I do that? Blame as a means to identify controlled effects in reinforcement learning

Oriol Corcoll, Youssef Sherif Mansour Mohamed, Raul Vicente

Published: 01 Aug 2022, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Affordance learning is a crucial ability of intelligent agents. This ability relies on understanding the different ways the environment can be controlled. Approaches encouraging RL agents to model controllable aspects of their environment have repeatedly achieved state-of-the-art results. Despite their success, these approaches have only been studied using generic tasks as a proxy but have not yet been evaluated in isolation. In this work, we study the problem of identifying controlled effects from a causal perspective. Humans compare counterfactual outcomes to assign a degree of blame to their actions. Following this idea, we propose Controlled Effect Network (CEN), a self-supervised method based on the causal concept of blame. CEN is evaluated in a wide range of environments against two state-of-the-art models, showing that it precisely identifies controlled effects.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Adam_M_White1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 100

Loading