Object-Centric Reward Learning from Action-Free Videos for Long-Horizon Manipulation Beyond Teleoperation

Andrea Protopapa; Giuseppe Averta; Francesca Pistilli; Davide Buoso

Object-Centric Reward Learning from Action-Free Videos for Long-Horizon Manipulation Beyond Teleoperation

Andrea Protopapa, Giuseppe Averta, Francesca Pistilli, Davide Buoso

Published: 31 May 2026, Last Modified: 31 May 2026Beyond Teleop workshop, ICRA 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Robot learning, Reinforcement learning, Graph neural networks, Representation learning, Inverse RL

TL;DR: We learn object-centric graph rewards from action-free videos for robot RL, showing that suppressing embodiment-specific motion improves policy learning and reveals semantic subtask structure for long-horizon manipulation.

Abstract: Teleoperated demonstrations have enabled substantial progress in robot learning, but they remain constrained by robot-specific interfaces, limited dexterity, and expensive data collection. We study how action-free video demonstrations can instead provide supervision for learning task-progress rewards that are later used for reinforcement learning. We propose an object-centric inverse reinforcement learning (IRL) framework that represents each observation as a graph of detected objects and relations, learns a temporally aligned latent space from videos, and defines dense rewards by distance to a goal embedding. A weighted graph pooling mechanism emphasizes task-relevant object dynamics while suppressing robot-dominated motion, encouraging rewards to reflect semantic progress rather than embodiment-specific trajectories. On a structured manipulation benchmark, the learned reward improves downstream RL performance over pixel-based and graph-based baselines, outperforming a hand-designed environment reward. On a longer-horizon task, the learned reward exhibits interpretable stage-wise transitions aligned with manipulation phases, suggesting a path toward automatic subtask discovery. These results support object-centric reward learning as a mechanism for extracting reusable manipulation structure from video and simulation data beyond teleoperation. Although our experiments use simulated demonstrations, the graph-based reward representation is designed to abstract task progress through objects and relations, making it a promising bridge toward future reward learning from human action-free videos.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 24

Loading