VideoDex: Learning Dexterity from Internet Videos

Kenneth Shaw; Shikhar Bahl; Deepak Pathak

VideoDex: Learning Dexterity from Internet Videos

Kenneth Shaw, Shikhar Bahl, Deepak Pathak

Published: 10 Sept 2022, Last Modified: 27 Apr 2025CoRL 2022 PosterReaders: Everyone

Keywords: Dexterous Manipulation, Large Scale Robotics, Imitation Learning

TL;DR: Learning Robot Action Priors from Human Videos

Abstract: To build general robotic agents that can operate in many environments, it is often imperative for the robot to collect experience in the real world. However, this is often not feasible due to safety, time and hardware restrictions. We thus propose leveraging the next best thing as real world experience: internet videos of humans using their hands. Visual priors, such as visual features, are often learned from videos, but we believe that more information from videos can be utilized as a stronger prior. We build a learning algorithm, Videodex, that leverages visual, action and physical priors from human video datasets to guide robot behavior. These action and physical priors in the neural network dictate the typical human behavior for a particular robot task. We test our approach on a robot arm and dexterous hand based system and show strong results on many different manipulation tasks, outperforming various state-of-the-art methods. For videos and supplemental material visit our website at https://video-dex.github.io.

Student First Author: yes

Supplementary Material: zip

Website: https://video-dex.github.io/

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 10 code implementations](https://www.catalyzex.com/paper/videodex-learning-dexterity-from-internet/code)

16 Replies

Loading