Keywords: Robot learning, Reinforcement learning, Graph neural networks, Representation learning, Inverse RL
TL;DR: We learn GNN-based object-centric rewards from video demonstrations that improve RL performance and reveal semantic task structure for long-horizon manipulation.
Abstract: Learning long-horizon manipulation skills from visual demonstrations remains challenging because reward design is difficult, manual subtask annotation is expensive, and pixel-based representations often generalize poorly across visual variations. Imitation learning (IL) enables efficient policy acquisition from demonstrations, but policies trained only by imitation often lack robustness when tested out of demos distribution. Reinforcement learning (RL), on the other hand, can improve policy performance through experience, but requires overly engineered and often sparse reward functions designed by domain experts.
In this work, we propose a graph-based inverse reinforcement learning (IRL) framework that bridges IL and RL by learning semantically grounded reward functions from demonstrations.
Rather than directly relying on raw images, we represent the scene as a graph of detected objects and their relations, and encode this graph with a graph neural network. The graph undergoes a weighted pooling mechanism that emphasizes dynamically relevant task entities.
The learned representation is used to define a dense reward based on latent-space distance to the goal, appropriately tracking task execution. Interestingly, we note that the learned reward evolves according to an interpretable stage-wise structure that reflects semantic progress through the task. This structure becomes especially useful in long-horizon settings, where it can provide a signal for identifying subtask boundaries without manual segmentation. This makes the framework suitable both for direct RL with a full-task reward and, in more complex tasks, for subtask-level policy learning.
Experiments on multi-step manipulation tasks show that the proposed object-centric reward improves downstream RL performance over pixel-based and graph-based baselines on complex manipulation tasks, and yields semantically meaningful reward transitions on long-horizon tasks. These results suggest that structured object-centric reward learning is a promising mechanism for combining imitation learning signals with reinforcement learning in long-horizon robotic manipulation.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 30
Loading