Can Humans Be out of the Loop?Download PDF

Published: 09 Feb 2022, Last Modified: 05 May 2023CLeaR 2022 PosterReaders: Everyone
Keywords: Causal inference, Graphical models, Reinfrocement Learning
TL;DR: We propose a novel reinforcement learning agent which proactively considers the intended actions of the human operator using counterfactual reasoning.
Abstract: Recent advances in Reinforcement Learning have allowed automated agents (for short, agents) to achieve a high level of performance across a wide range of tasks, which when supplemented with human feedback has led to faster and more robust decision-making. The current literature, in large part, focuses on the human's role during the learning phase: human trainers possess a priori knowledge that could help an agent to accelerate its learning when the environment is not fully known. In this paper, we study an interactive reinforcement learning setting where the agent and the human have different sensory capabilities, disagreeing, therefore, on how they perceive the world (observed states) while sharing the same reward and transition functions. We show that agents are bound to learn sub-optimal policies if they do not take into account human advice, perhaps surprisingly, even when human's decisions are less accurate than their own. We propose the counterfactual agent who proactively considers the intended actions of the human operator, and proves that this strategy dominates standard approaches regarding performance. Finally, we formulate a novel reinforcement learning task maximizing the performance of an autonomous system subject to a budget constraint over the available amount of human advice.
10 Replies

Loading