Bootstrapping the Expressivity with Model-based Planning

Kefan Dong; Yuping Luo; Tengyu Ma

Bootstrapping the Expressivity with Model-based Planning

Kefan Dong, Yuping Luo, Tengyu Ma

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: We compare deep model-based and model-free RL algorithms by studying the approximability of $Q$-functions, policies, and dynamics by neural networks.

Abstract: We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal $Q$-functions and policies are much more complex than the dynamics. We hypothesize many real-world MDPs also have a similar property. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak $Q$-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on MuJoCo benchmark tasks.

Keywords: reinforcement learning theory, model-based reinforcement learning, planning, expressivity, approximation theory, deep reinforcement learning theory

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/bootstrapping-the-expressivity-with-model/code)

Original Pdf: pdf

9 Replies

Loading