Differentiable Parsing and Visual Grounding of Verbal Instructions for Object PlacementDownload PDF

Published: 15 Nov 2022, Last Modified: 05 May 2023LangRob 2022 PosterReaders: Everyone
Keywords: Language Grounding, Human-Robot Interactions
TL;DR: We proposed ParaGon for language-conditioned object placing, which is data-efficient and generalizable for learning compositional instructions, and robust to noisy, ambiguous language inputs.
Abstract: Grounding spatial relations in natural language for object placing could have ambiguity and compositionality issues. To address the issues, we introduce ParaGon, a PARsing And visual GrOuNding framework for language-conditioned object placement. ParaGon leverages object-centric relational representations for the visual grounding of natural language. It parses language instructions into relations between objects and grounds those objects in visual scenes. A particle-based GNN then conducts relational reasoning between grounded objects for placement generation. ParaGon encodes all of those procedures into neural networks for end-to-end training. Our approach inherently integrates parsing-based methods into a probabilistic, data-driven framework. It is data-efficient and generalizable for learning compositional instructions, robust to noisy language inputs, and adapts to the uncertainty of ambiguous instructions.
3 Replies

Loading