Keywords: decision aware model learning, representation learning, model based reinforcement learning, mbrl, value improvement path, value equivalent model
TL;DR: A work-in-progress report that connects the idea of value-equivalent model learning with insights from the representation learning community to build robust value aware models on meaningful value function sets
Abstract: We propose a practical and generalizable Decision-Aware Model-Based Reinforcement Learning algorithm. We extend the frameworks of VAML (Farahmand et al., 2017) and IterVAML (Farahmand, 2018), which have been shown to be difficult to scale to high-dimensional and continuous environments (Lovatto et al., 2020a; Modhe et al., 2021; Voelcker et al., 2022). We propose to use the notion of the Value Improvement Path (Dabney et al., 2020) to improve the generalization of VAML-like model learning. We show theoretically for linear and tabular spaces that our proposed algorithm is sensible, justifying extension to non-linear and continuous spaces. We also present a detailed implementation proposal based on these ideas.