Objects Matter: Object-Centric World Models Improve Reinforcement Learning in Visually Complex Environments

Published: 20 Jun 2025, Last Modified: 22 Jul 2025RLVG Workshop - RLC 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement learning, Model-based RL, Object-centric RL, Video object segmentation, Atari, Hollow Knight
TL;DR: We proposed an object-centric model-based RL pipeline, which integrates recent advances in computer vision to allow agents to focus on key decision-related elements.
Abstract: Deep reinforcement learning has achieved remarkable success in learning control policies from pixels across a wide range of tasks, yet its application remains hindered by low sample efficiency, requiring significantly more environment interactions than humans to reach comparable performance. Model-based reinforcement learning (MBRL) offers a solution by leveraging learnt world models to generate simulated experience, thereby improving sample efficiency. In visually complex environments, small or dynamic elements can be critical for decision-making. However, traditional MBRL methods in pixel-based environments typically rely on auto-encoding with an $L_2$ loss, which is dominated by large areas and often fails to capture decision-relevant details. To address these limitations, we propose an \textbf{object-centric MBRL pipeline}, which integrates recent advances in computer vision to allow agents to focus on key decision-related elements. We demonstrate OC-STORM's practical value in overcoming the limitations of conventional MBRL approaches on both Atari games and the visually complex game Hollow Knight.
Submission Number: 20
Loading