Keywords: Model-based RL, Object-centric RL, Video object segmentation, Atari, Hollow Knight
TL;DR: We proposed an object-centric model-based RL pipeline, which integrates recent advances in computer vision to allow agents to focus on key decision-related elements. Extensive experiments demonstrate the efficacy of our method.
Abstract: While deep reinforcement learning (RL) from pixels has achieved remarkable success, its sample inefficiency remains a critical limitation for real-world applications. Model-based RL (MBRL) addresses this by learning a world model to generate simulated experience, but standard approaches that rely on pixel-level reconstruction losses often fail to capture small, task-critical objects in complex, dynamic scenes. We posit that an object-centric (OC) representation can direct model capacity toward semantically meaningful entities, improving dynamics prediction and sample efficiency. In this work, we introduce **OC-STORM**, an object-centric MBRL framework that enhances a learned world model with object representations extracted by a pretrained segmentation network. By conditioning on a minimal number of annotated frames, OC-STORM learns to track decision-relevant object dynamics and inter-object interactions without extensive labeling or access to privileged information. Empirical results demonstrate that OC-STORM significantly outperforms the STORM baseline on the Atari 100k benchmark and achieves state-of-the-art sample efficiency on challenging boss fights in the visually complex game **Hollow Knight**. Our findings underscore the potential of integrating OC priors into MBRL for complex visual domains.
Project page: https://oc-storm.weipuzhang.com
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 6126
Loading