Question Generation for Uncertainty Elimination in Referring Expressions in 3D Environments

Published: 2023, Last Modified: 04 Nov 2024ICRA 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We introduce a new task of question generation to eliminate the uncertainty of referring expressions in 3D indoor environments (3D-REQ). Referring to an object using natural language is one of the most common occurrences in daily human conversations; therefore, instructing robots to identify a certain object using natural language could be an essential task in var-ious robotic applications, such as room arrangement. However, human instructions are sometimes uncertain. Existing research on visual grounding using natural language in a 3D environment assumes that the referring expression can uniquely identify the object and does not consider that humans unconsciously give uncertain expressions. When faced with uncertainties, humans ask questions to gain further information. Inspired by the above observation, we propose a method that reduces uncertainty by asking questions when being given an obscure referring expression. The purpose of this method is to predict the positions of all candidate objects that satisfy the referring expressions in a 3D indoor environment and then to ask the appropriate questions to narrow down the target objects from them. To achieve this, we constructed a new 3D-REQ dataset, the input of which is a referring expression with uncertainties in the 3D environment and point clouds, and the output of which is the bounding boxes of all candidate objects satisfying the referring expression and a question to eliminate the uncertainty. To the best of our knowledge, 3D-REQ is the first effort to eliminate the uncertainty of referring expressions for object grounding in 3D environments.
Loading