How to Specify Reinforcement Learning Objectives

Published: 04 Jun 2024, Last Modified: 19 Jul 2024Finding the Frame: RLC 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reward functions, alignment, reinforcement learning, problem specification
TL;DR: With a focus on alignment, we discuss how practically to specify reinforcement learning (RL) objectives through careful design of reward functions and discounting.
Abstract: We discuss how practically to specify reinforcement learning (RL) objectives through careful design of reward functions and discounting. We specifically focus on defining a _human-aligned_ objective for the RL problem, and we argue that reward shaping and decreasing discounting, if desired, are part of the RL solution—not the problem—and should be saved for a second step after this paper's focus. We provide tools for diagnosing misalignment in RL objectives, such as finding preference mismatches between the RL objective and human judgments and examining the indifference point between risky and safe trajectory lotteries. We discuss common pitfalls that can lead to misalignment, including naive reward shaping, trial-and-error reward tuning, and improper handling of discount factors. We also sketch candidate best practices for designing interpretable, aligned RL objectives and discuss open problems that hinder the design of aligned RL objectives in practice.
Submission Number: 20
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview