Task-aware world model learning with meta weighting via bi-level optimization

Huining Yuan; Hongkun Dou; Xingyu Jiang; Yue Deng

Task-aware world model learning with meta weighting via bi-level optimization

Huining Yuan, Hongkun Dou, Xingyu Jiang, Yue Deng

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX

Keywords: Model-based reinforcement learning, world model, generative model, meta-learning, bi-level optimization

TL;DR: A task-aware framework for world model learning in model-based reinforcement learning.

Abstract: Aligning the world model with the environment for the agent’s specific task is crucial in model-based reinforcement learning. While value-equivalent models may achieve better task awareness than maximum-likelihood models, they sacrifice a large amount of semantic information and face implementation issues. To combine the benefits of both types of models, we propose Task-aware Environment Modeling Pipeline with bi-level Optimization (TEMPO), a bi-level model learning framework that introduces an additional level of optimization on top of a maximum-likelihood model by incorporating a meta weighter network that weights each training sample. The meta weighter in the upper level learns to generate novel sample weights by minimizing a proposed task-aware model loss. The model in the lower level focuses on important samples while maintaining rich semantic information in state representations. We evaluate TEMPO on a variety of continuous and discrete control tasks from the DeepMind Control Suite and Atari video games. Our results demonstrate that TEMPO achieves state-of-the-art performance regarding asymptotic performance, training stability, and convergence speed.

Supplementary Material: pdf

Submission Number: 3253

Loading