Characterising Partial Identifiability in Inverse Reinforcement Learning For Agents With Non-Exponential Discounting

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: pdf
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: inverse reinforcement learning, partial identifiability, hyperbolic discounting, discounting, reward learning, preference elicitation
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We characterise partial identifiability of the reward function in inverse reinforcement learning for agents that use hyperbolic (or other forms of non-exponential) discounting.
Abstract: The aim of inverse reinforcement learning (IRL) is to infer an agent's *preferences* from their *behaviour*. Usually, preferences are modelled as a reward function, $R$, and behaviour is modelled as a policy, $\pi$. One of the central difficulties in IRL is that multiple preferences may lead to the same behaviour. That is, $R$ is typically underdetermined by $\pi$, which means that $R$ is only *partially identifiable*. Recent work has characterised the extent of this partial identifiability for different types of agents, including *optimal* agents and *Boltzmann-rational* agents. However, work so far has only considered agents that discount future reward exponentially. This is a serious limitation, for instance because extensive work in the behavioural sciences suggests that humans are better modeled as discounting *hyperbolically*. In this work, we characterise the partial identifiability in IRL for agents that use non-exponential discounting. Our results are relevant for agents that discount hyperbolically, but they also more generally apply to agents that use other types of discounting. We show that IRL, in these cases, is unable to infer enough information about $R$ to identify the correct optimal policy. This suggests that IRL alone is insufficient to adequately characterise the preferences of such agents.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3737
Loading