Improved Zero-Shot Object Localization using Contextualized Prompts and Objects in ContextDownload PDF

Published: 07 May 2023, Last Modified: 12 May 2023ICRA-23 Workshop on Pretraining4Robotics LightningReaders: Everyone
Keywords: object localization, open world robotics, language-vision model, prior knowledge
TL;DR: We improve object localization by extending GLIP with contextual knowledge to diversify the input prompts and to verify relations between objects.
Abstract: Localizing objects is an essential capability for robots to do tasks more autonomously. For instance, finding the door and its handle to navigate to the next room. For scalability to open world settings, it is important to localize objects which have not been seen before (zero-shot). For instance, the door has a push bar, instead of the conventional handle. Pretrained large language-vision object detection models, such as GLIP, can localize a broad variety of object classes reasonably well based on textual prompts and are ideal for zero-shot robotics. We extend GLIP with contextual knowledge to diversify the input prompts for better recall (pre-processing) and to filter the candidate objects using relational information of objects in context for better precision (post-processing). Diversification of prompts is helpful to cover variations of the object (e.g., different types of door handles). Spatial relations of objects are helpful to verify object candidates (e.g., the handle is close to the door). This verification is done by a neuro-symbolic program, endowed with first-order logic to define the spatial relations. We show that recall and precision of GLIP can be improved by leveraging contextual knowledge and without retraining.
0 Replies

Loading