FindThis: Language-Driven Object Disambiguation in Indoor EnvironmentsDownload PDF

Published: 30 Aug 2023, Last Modified: 17 Oct 2023CoRL 2023 PosterReaders: Everyone
Keywords: object disambiguation, instruction following, language interaction, visual navigation
TL;DR: We present a new task, dataset, and method focused on language-driven object disambiguation in indoor 3D environments.
Abstract: Natural language is naturally ambiguous. In this work, we consider interactions between a user and a mobile service robot tasked with locating a desired object, specified by a language utterance. We present a task FindThis, which addresses the problem of how to disambiguate and locate the particular object instance desired through a dialog with the user. To approach this problem we propose an algorithm, GoFind, which exploits visual attributes of the object that may be intrinsic (e.g., color, shape), or extrinsic (e.g., location, relationships to other entities), expressed in an open vocabulary. GoFind leverages the visual common sense learned by large language models to enable fine-grained object localization and attribute differentiation in a zero-shot manner. We also provide a new visio-linguistic dataset, 3D Objects in Context (3DOC), for evaluating agents on this task consisting of Google Scanned Objects placed in Habitat-Matterport 3D scenes. Finally, we validate our approach on a real robot operating in an unstructured physical office environment using complex fine-grained language instructions.
Student First Author: yes
Supplementary Material: zip
Instructions: I have read the instructions for authors (
Publication Agreement: pdf
Poster Spotlight Video: mp4
13 Replies