A Study of the Weighted Multi-step Loss Impact on the Predictive Error and the Return in MBRL

Abdelhakim Benechehab; Albert Thomas; Giuseppe Paolo; Maurizio Filippone; Balázs Kégl

A Study of the Weighted Multi-step Loss Impact on the Predictive Error and the Return in MBRL

Abdelhakim Benechehab, Albert Thomas, Giuseppe Paolo, Maurizio Filippone, Balázs Kégl

Published: 07 Jun 2024, Last Modified: 09 Aug 2024RLC 2024 ICBINB PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Model-based Reinforcement Learning, Compounding errors, Multi-step models, Noisy observations

TL;DR: In the context of model-based reinforcement learning, we study a training objective for 1-step dynamics models that jointly optimizes the error at future horizons.

Abstract: In model-based reinforcement learning, most algorithms rely on simulating trajectories from one-step models of the dynamics learned on data. A critical challenge of this approach is the compounding of one-step prediction errors as the length of the trajectory grows. In this paper we tackle this issue by using a multi-step objective to train one-step models. Our objective is a weighted sum of the mean squared error (MSE) loss at various future horizons. We find that this new loss is particularly useful when the data is noisy (additive Gaussian noise in the observations), which is often the case in real-life environments. We show in a variety of tasks (environments or datasets) that the models learned with this loss achieve a significant improvement in terms of the averaged R2-score on future prediction horizons. To our surprise, in the pure batch reinforcement learning setting, we find that the multi-step loss-based models perform only marginally better than the baseline. Furthermore, this improvement is only observed for small loss horizons, unlike the trend present with the R2-score on the respective datasets.

Submission Number: 5

Loading