Keywords: reinforcement learning;multi-objective decision making; multi-objective reinforcement learning;learning theory;markov decision processes
TL;DR: In Multi-Objective Reinforcement Learning, we offer a characterisation of the types of preferences that can be expressed as utility functions, and the utility functions for which an associated optimal policy exists.
Abstract: Multi-objective reinforcement learning (MORL) is an excellent framework for multi-objective sequential decision-making. MORL employs a utility function to aggregate multiple objectives into one that expresses a user's preferences. However, MORL still misses two crucial theoretical analyses of the properties of utility functions: (1) a characterisation of the utility functions for which an associated optimal policy exists, and (2) a characterisation of the types of preferences that can be expressed as utility functions. As a result, we formally characterise the families of preferences and utility functions that MORL should focus on: those for which an optimal policy is guaranteed to exist. We expect our theoretical results to promote the development of novel MORL algorithms that exploit our theoretical findings.
Primary Area: Learning theory
Submission Number: 16735
Loading