Keywords: Offline RL, World Models, Model-based RL, Action chunking, Long-horizon tasks
TL;DR: Action chunking enables long-horizon rollouts, scaling offline model-based RL to complex long-horizon tasks.
Abstract: In this paper, we study whether model-based reinforcement learning (RL), in particular model-based value expansion,
can provide a scalable recipe for tackling complex, long-horizon tasks in offline RL.
Model-based value expansion fits an on-policy value function using length-$n$ imaginary rollouts generated by the current policy and a learned dynamics model.
While larger $n$ reduces bias in value bootstrapping, it amplifies accumulated model errors over long horizons, degrading future predictions.
We address this trade-off with
an *action-chunk* model that predicts a future state from a sequence of actions (an "action chunk")
instead of a single action, which reduces compounding errors.
In addition, instead of directly training a policy to maximize rewards,
we employ rejection sampling from an expressive behavioral action-chunk policy,
which prevents model exploitation from out-of-distribution actions.
We call this recipe **Model-Based RL with Action Chunks (MAC)**.
Through experiments on highly challenging tasks with large-scale datasets of up to $100$M transitions,
we show that MAC achieves the best performance among offline model-based RL algorithms,
especially on challenging long-horizon tasks.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 6369
Loading