Video-Enhanced Offline Reinforcement Learning: A Model-Based Approach

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Offline reinforcement learning (RL) enables policy optimization using static datasets, avoiding the risks and costs of extensive real-world exploration. However, it struggles with suboptimal offline behaviors and inaccurate value estimation due to the lack of environmental interaction. We present Video-Enhanced Offline RL (VeoRL), a model-based method that constructs an interactive world model from diverse, unlabeled video data readily available online. Leveraging model-based behavior guidance, our approach transfers commonsense knowledge of control policy and physical dynamics from natural videos to the RL agent within the target domain. VeoRL achieves substantial performance gains (over 100% in some cases) across visual control tasks in robotic manipulation, autonomous driving, and open-world video games. Project page: https://panmt.github.io/VeoRL.github.io.
Lay Summary: Training AI systems to perform real-world tasks often requires risky and costly trial-and-error in physical environments. While existing methods use pre-recorded datasets to avoid this, they face a critical limitation: robots or AI agents trained this way often make poor decisions because they can’t interact with the real world to test and refine their understanding. VeoRL solves this by letting AI learn from available online videos—like robot demonstrations, car dashcam footage, or gameplay streams—to build a “virtual playground.” By analyzing video patterns, VeoRL automatically learns real-world physics and control strategies. This “common sense” helps AI avoid dangerous mistakes. For example, robots master tasks after “watching” human-like movements in videos, while self-driving systems improve safety by practicing with traffic footage. Tested in robotics, driving, and gaming, VeoRL doubled performance in some cases compared to standard methods—proving AI can learn complex skills by observing first, acting later, just like humans do.
Primary Area: Reinforcement Learning->Batch/Offline
Keywords: Offline Reinforcement Learning, Model-based Reinforcement Learning, Visuomotor Control
Submission Number: 765
Loading