VIRL: Self-Supervised Visual Graph Inverse Reinforcement Learning

Lei Huang; Weijia Cai; Zihan Zhu; Chen Feng; Helge Rhodin; Zhengbo Zou

VIRL: Self-Supervised Visual Graph Inverse Reinforcement Learning

Lei Huang, Weijia Cai, Zihan Zhu, Chen Feng, Helge Rhodin, Zhengbo Zou

Published: 05 Sept 2024, Last Modified: 08 Nov 2024CoRL 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Inverse Reinforcement Learning, Learning from Video, Graph Network

TL;DR: An inverse reinforcement learning method that leverages visual features and graph abstractions from videos to learn dense reward functions for reinforcement learning, demonstrating generalization to extrapolation tasks and unseen domains.

Abstract: Learning dense reward functions from unlabeled videos for reinforcement learning exhibits scalability due to the vast diversity and quantity of video resources. Recent works use visual features or graph abstractions in videos to measure task progress as rewards, which either deteriorate in unseen domains or capture spatial information while overlooking visual details. We propose $\textbf{V}$isual-Graph $\textbf{I}$nverse $\textbf{R}$einforcement $\textbf{L}$earning (VIRL), a self-supervised method that synergizes low-level visual features and high-level graph abstractions from frames to graph representations for reward learning. VIRL utilizes a visual encoder that extracts object-wise features for graph nodes and a graph encoder that derives properties from graphs constructed from detected objects in each frame. The encoded representations are enforced to align videos temporally and reconstruct in-scene objects. The pretrained visual graph encoder is then utilized to construct a dense reward function for policy learning by measuring latent distances between current frames and the goal frame. Our empirical evaluation on the X-MAGICAL and Robot Visual Pusher benchmark demonstrates that VIRL effectively handles tasks necessitating both granular visual attention and broader global feature consideration, and exhibits robust generalization to $\textit{extrapolation}$ tasks and domains not seen in demonstrations. Our policy for the robotic task also achieves the highest success rate in real-world robot experiments.

Supplementary Material: zip

Spotlight Video: mp4

Website: https://leihhhuang.github.io/VIRL/

Publication Agreement: pdf

Student Paper: yes

Submission Number: 575

Loading