On shallow planning under partial observability

Randy Lefebvre; Audrey Durand

On shallow planning under partial observability

Randy Lefebvre, Audrey Durand

Published: 01 Jun 2024, Last Modified: 07 Aug 2024Deployable RL @ RLC 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Discount factor; Bias; POMDP

TL;DR: New theoretical results to capture the impact of the discount factor on the value function bias and its link to partial observability.

Abstract: Formulating a real-world problem under the Reinforcement Learning framework involves non-trivial design choices, e.g. selecting a discount factor for the learning objective (discounted cumulative rewards), which articulates the planning horizon of the agent. This work investigates the impact of the discount factor on the bias-variance trade-off given structural parameters of the underlying Markov Decision Process. Our results support the idea that a shorter planning horizon might be beneficial, especially under partial observability.

Submission Number: 10

Loading