- Abstract: It is challenging to design a reward function for complex, real-world tasks. Reward learning algorithms let one instead infer a reward function from data. However, multiple reward functions often explain the data equally well, even in the limit of infinite data. Prior work has focused on situations where the reward function is uniquely recoverable, by introducing additional assumptions or data sources. By contrast, we formally characterise this partial identifiability for popular data sources such as demonstrations and trajectory preferences. We analyse the impact of this ambiguity on downstream tasks such as policy optimisation, including under shifts in environment dynamics. These results have implications for the practical design and selection of data sources for reward learning.
- One-sentence Summary: Theoretical analysis of partial identifiability of core reward learning methods including IRL and preference comparisons, with links to lattice structure of invariances of core RL objects.