Closing The Performance Gap Between Zero-shot And Post-trained Reward Models

Marius Memmel; Ankit Goyal; Dieter Fox; Abhishek Gupta; Anqi Li; Fabio Ramos

Closing The Performance Gap Between Zero-shot And Post-trained Reward Models

Marius Memmel, Ankit Goyal, Dieter Fox, Abhishek Gupta, Anqi Li, Fabio Ramos

Published: 13 May 2026, Last Modified: 13 May 2026ICRA 2026: From Data to Decisions PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reward modeling, robotics, vla

Abstract: Accurately estimating task progress and deriving robust reward functions from raw video are critical for advancing reinforcement learning (RL) and robotics. While recent Reward Foundation Models (RFMs) have shown promise by fine-tuning Vision-Language Models (VLMs) on robotic datasets, leveraging existing zero-shot VLMs for this task remains difficult due to a significant lack of calibration and a tendency for temporal hallucinations. In this work, we propose SCORE, a novel prompting framework that transforms progress prediction from a black-box logit extraction task into an explicit reasoning-in-language process. SCORE decomposes the problem into two stages: (1) grounded video description, which ensures the model focuses on task-relevant physical interactions, and (2) semantic progress reasoning, where the VLM jointly predicts a textual completion anchor and a calibrated numerical progress sequence. Our approach effectively closes the performance gap between zero-shot methods and state-of-the-art post-trained RFMs. In offline benchmarks, SCORE outperforms existing baselines in trajectory ranking and cross-task calibration. Furthermore, we demonstrate the real-world utility of SCORE by using it as a reward signal for Diffusion Steering RL (DSRL); our method enables a VLA policy to overcome strong initial biases, achieving a +90\% success rate improvement over vanilla policies. Finally, we provide an empirical scaling analysis showing that progress prediction capabilities improve significantly with each new generation of frontier VLMs, positioning SCORE as a scalable, high-performance solution for zero-shot reward modeling.

Submission Number: 19

Loading