- TL;DR: We find that irrationality from an expert demonstrator can help a learner infer their preferences.
- Abstract: Specifying reward functions is difficult, which motivates the area of reward inference: learning rewards from human behavior. The starting assumption in the area is that human behavior is optimal given the desired reward function, but in reality people have many different forms of irrationality, from noise to myopia to risk aversion and beyond. This fact seems like it will be strictly harmful to reward inference: it is already hard to infer the reward from rational behavior, and noise and systematic biases make actions have less direct of a relationship to the reward. Our insight in this work is that, contrary to expectations, irrationality can actually help rather than hinder reward inference. For some types and amounts of irrationality, the expert now produces more varied policies compared to rational behavior, which help disambiguate among different reward parameters -- those that otherwise correspond to the same rational behavior. We put this to the test in a systematic analysis of the effect of irrationality on reward inference. We start by covering the space of irrationalities as deviations from the Bellman update, simulate expert behavior, and measure the accuracy of inference to contrast the different types and study the gains and losses. We provide a mutual information-based analysis of our findings, and wrap up by discussing the need to accurately model irrationality, as well as to what extent we might expect (or be able to train) real people to exhibit helpful irrationalities when teaching rewards to learners.
- Keywords: preference inference, inverse reinforcement learning, reward inference, irrationality