Learning to Navigate Using Mid-Level Visual Priors

Alexander Sax, Jeffrey O. Zhang, Bradley Emi, Amir Zamir, Silvio Savarese, Leonidas Guibas, Jitendra Malik

01 Mar 2020 (modified: 12 May 2023)OpenReview Archive Direct UploadReaders: Everyone

Abstract: Abstract: How much does having visual priors about the world (e.g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e.g. navigating a complex environment)? What are the consequences of not utilizing such visual priors in learning? We study these questions by integrating a generic perceptual skill set (a distance estimator, an edge detector, etc.) within a reinforce- ment learning framework (see Fig. 1). This skill set (“mid-level vision”) provides the policy with a more processed state of the world compared to raw images. Our large-scale study demonstrates that using mid-level vision results in policies that learn faster, generalize better, and achieve higher final performance, when compared to learning from scratch and/or using state-of-the-art visual and non- visual representation learning methods. We show that conventional computer vi- sion objectives are particularly effective in this regard and can be conveniently in- tegrated into reinforcement learning frameworks. Finally, we found that no single visual representation was universally useful for all downstream tasks, hence we computationally derive a task-agnostic set of representations optimized to support arbitrary downstream tasks.

0 Replies