VIPer: Iterative Value-Aware Model Learning on the Value Improvement Path

Romina Abachi; Claas A Voelcker; Animesh Garg; Amir-massoud Farahmand

VIPer: Iterative Value-Aware Model Learning on the Value Improvement Path

Romina Abachi, Claas A Voelcker, Animesh Garg, Amir-massoud Farahmand

28 May 2022 (modified: 05 May 2023)DARL 2022Readers: Everyone

Keywords: decision aware model learning, representation learning, model based reinforcement learning, mbrl, value improvement path, value equivalent model

TL;DR: A work-in-progress report that connects the idea of value-equivalent model learning with insights from the representation learning community to build robust value aware models on meaningful value function sets

Abstract: We propose a practical and generalizable Decision-Aware Model-Based Reinforcement Learning algorithm. We extend the frameworks of VAML (Farahmand et al., 2017) and IterVAML (Farahmand, 2018), which have been shown to be difficult to scale to high-dimensional and continuous environments (Lovatto et al., 2020a; Modhe et al., 2021; Voelcker et al., 2022). We propose to use the notion of the Value Improvement Path (Dabney et al., 2020) to improve the generalization of VAML-like model learning. We show theoretically for linear and tabular spaces that our proposed algorithm is sensible, justifying extension to non-linear and continuous spaces. We also present a detailed implementation proposal based on these ideas.

0 Replies

Loading