Keywords: Reward Model, Robot Learning, Reinforcement Learning, Foundation Model
TL;DR: We introduce Robometer, a reward model trained on over 1M trajectories that outperforms state-of-the-art baselines in reward generalization and downstream robot learning tasks.
Abstract: Current general-purpose robot reward models rely on frame-level progress labels from expert demonstrations. This approach scales poorly to large datasets, where suboptimal or failed trajectories are abundant and absolute progress is ambiguous. To address this, we introduce Robometer, a scalable reward modeling framework combining intra-trajectory progress supervision with inter-trajectory preference supervision. Robometer utilizes a dual objective: a frame-level loss that anchors reward magnitude to expert data, and a trajectory comparison preference loss that imposes global ordering constraints. This enables effective learning from both successful and failed trajectories. To support this formulation at scale, we curate RBM-1M, a dataset of over one million multi-embodiment trajectories containing extensive suboptimal and failure data. Across benchmarks and real-world evaluations, Robometer learns highly generalizable reward functions and improves downstream robot learning performance. Code, model weights, and videos at https://anon-robometer.github.io/.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 5
Loading