ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning

Hongyin Zhang; Zifeng Zhuang; Han Zhao; Pengxiang Ding; Hongchao Lu; Donglin Wang

ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning

Hongyin Zhang, Zifeng Zhuang, Han Zhao, Pengxiang Ding, Hongchao Lu, Donglin Wang

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We introduce Reinforced Robot GPT (ReinboT), a novel end-to-end VLA model that integrates the RL principles of maximizing cumulative reward.

Abstract: Vision-Language-Action (VLA) models have shown great potential in general robotic decision-making tasks via imitation learning. However, the variable quality of training data often constrains the performance of these models. On the other hand, offline Reinforcement Learning (RL) excels at learning robust policy models from mixed-quality data. In this paper, we introduce Reinforced robot GPT (ReinboT), a novel end-to-end VLA model that integrates the RL principle of maximizing cumulative reward. ReinboT achieves a deeper understanding of the data quality distribution by predicting dense returns that capture the nuances of manipulation tasks. The dense return prediction capability enables the robot to generate more robust decision-making actions, oriented towards maximizing future benefits. Extensive experiments show that ReinboT achieves state-of-the-art performance on the CALVIN mixed-quality dataset and exhibits superior few-shot learning and out-of-distribution generalization capabilities in real-world tasks.

Lay Summary: Robots often learn by copying human actions, but the quality of these examples can vary a lot, which hurts performance. We introduce ReinboT, a new model that helps robots make better decisions by using ideas from reinforcement learning, where the goal is to choose actions that lead to the best long-term results. ReinboT learns to predict how useful each step is, helping it focus on good data and ignore bad ones. This makes the robot’s actions more reliable and smarter over time. In tests, ReinboT outperformed other methods on a challenging dataset with mixed-quality data and showed strong results even in new, unseen tasks. Our work helps robots become better at learning complex tasks, even from imperfect data.

Primary Area: Applications->Robotics

Keywords: VLA Model, Reinforcement Learning, Robotic Manipulation

Submission Number: 10118

Loading