Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

Arjun Majumdar; Karmesh Yadav; Sergio Arnaud; Yecheng Jason Ma; Claire Chen; Sneha Silwal; Aryan Jain; Vincent-Pierre Berges; Pieter Abbeel; Dhruv Batra; Yixin Lin; Oleksandr Maksymets; Aravind Rajeswaran; Franziska Meier

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

Published: 03 Mar 2023, Last Modified: 12 Oct 2025RRL 2023 SpotlightReaders: Everyone

Keywords: representation learning, pre-training, foundation models, embodied AI, reinforcement learning

TL;DR: We present the largest and most comprehensive empirical study of visual foundation models for Embodied AI (EAI).

Abstract: We present the largest and most comprehensive empirical study of visual foundation models for Embodied AI (EAI). First, we curate CORTEXBENCH, consisting of 17 different EAI tasks spanning locomotion, navigation, dexterous and mobile manipulation. Next, we systematically evaluate existing visual foundation models and find that none is universally dominant. To study the effect of pre-training data scale and diversity, we combine ImageNet with over 4,000 hours of egocentric videos from 7 different sources (over 5.6M images) and train different sized vision transformers using Masked Auto-Encoding (MAE) on slices of this data. These models required over 10,000 GPU-hours to train and will be open-sourced to the community. We find that scaling dataset size and diversity does not improve performance across all tasks but does so on average. Finally, we show that adding a second pre-training step on a small in-domain dataset improves performance, matching or outperforming the best known results in this setting.

Track: Technical Paper

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/where-are-we-in-the-search-for-an-artificial/code)

1 Reply

Loading