Keywords: Offline RL, Discount Factor, Bias-variance Tradeoff, Input-dependent MDP
TL;DR: This work investigates the bias-variance tradeoff from shallow planning in Input-dependent MDPs and suggests a new avenue for input-dependent discounting.
Abstract: Offline reinforcement learning has gained a lot of popularity for its potential to solve industry challenges. However, real-world environments are often highly stochastic and partially observable, leading long-term planners to overfit to offline data in model-based settings. Input-driven Markov Decision Processes (IDMDPs) offer a way to work with some of the uncertainty by letting designers separate what the agent has control over (states) from what it cannot (inputs) in the environnement. These stochastic external inputs are often difficult to model. Under the assumption that the input model will be imperfect, we investigate the bias-variance tradeoff under shallow planning in IDMDPs. Paving the way to input-driven planning horizons, we also investigate the similarity of optimal planning horizons at different inputs given the structure of the input space.
Submission Number: 145
Loading