How to Specify Reinforcement Learning Objectives

W. Bradley Knox; James MacGlashan

How to Specify Reinforcement Learning Objectives

W. Bradley Knox, James MacGlashan

Published: 04 Jun 2024, Last Modified: 19 Jul 2024Finding the Frame: RLC 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reward functions, alignment, reinforcement learning, problem specification

TL;DR: With a focus on alignment, we discuss how practically to specify reinforcement learning (RL) objectives through careful design of reward functions and discounting.

Abstract:

We discuss how practically to specify reinforcement learning (RL) objectives through careful design of reward functions and discounting. We specifically focus on defining a human-aligned objective for the RL problem, and we argue that reward shaping and decreasing discounting, if desired, are part of the RL solution—not the problem—and should be saved for a second step after this paper's focus. We provide tools for diagnosing misalignment in RL objectives, such as finding preference mismatches between the RL objective and human judgments and examining the indifference point between risky and safe trajectory lotteries. We discuss common pitfalls that can lead to misalignment, including naive reward shaping, trial-and-error reward tuning, and improper handling of discount factors. We also sketch candidate best practices for designing interpretable, aligned RL objectives and discuss open problems that hinder the design of aligned RL objectives in practice.

Submission Number: 20

Loading