FLIP-TD: Free Lunch Inpainting on Top-Down Images for Robotic TasksDownload PDF

Published: 07 May 2023, Last Modified: 16 May 2023ICRA-23 Workshop on Pretraining4Robotics LightningReaders: Everyone
Keywords: image prediciton, semantic segmentation, robotics application, vision transformer, pretraining
TL;DR: We can use the already pre-trained vision encoder networks without finetuning for some of the robotics task
Abstract: Robotic systems are often limited by their sensor Field of View (FoV), which makes collision-free navigation and exploration in an unknown environment challenging. In contrast, humans are better at it because they can use their prior knowledge and consider the information beyond the FoV. What if robots could do it too? In our proposed approach, we aim to enhance the intelligence of robots by utilizing pre-trained masked autoencoders to make predictions of expanded FoV and synthesis a novel view. This allows the robot to reason and make informed decisions for safe and efficient navigation in unknown environments. We demonstrate the effectiveness of computer vision algorithms, specifically masked autoencoders, in solving practical robotics problems without the need for fine-tuning by using only top-down images. Our approach is evaluated in both indoor and outdoor environments, showcasing its performance in various settings of RGB, semantic segmentation, and binary images.
0 Replies

Loading