Essay.08-WeinanQian-2100017831

03 Dec 2023 (modified: 26 Jan 2024)PKU 2023 Fall CoRe SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Utility, Preference-based Reinforcement Learning, Bradley-Terry Model, Linear Utility Function
Abstract: Utility holds paramount significance in comprehending human decision-making processes, with the principle of maximum expected utility widely embraced across various domains, including Preference-based Reinforcement Learning (PbRL). This approach addresses numerous challenges within Reinforcement Learning (RL), particularly those concerning the reward function it employs, by adopting the concept of human utility. By utilizing a utility function that doubles as the reward function, PbRL empowers agents to make more rational decisions that closely align with human intentions. Therefore, it becomes imperative to explore methods for representing and adapting the utility function within PbRL. This paper aims to introduce two distinct approaches for representing and learning the utility function, leveraging collected human preferences between trajectory pairs. Additionally, it delves into an analysis of their respective merits and limitations within practical scenarios.
Submission Number: 165
Loading