Bridging the Gap to Natural Language-based Grasp Predictions through Semantic Information Extraction

Niko Kleer, Martin Feick, Amr Gomaa, Michael Feld, Antonio Krüger

Published: 2024, Last Modified: 24 Feb 2025IROS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Enabling multi-fingered robots to choose an appropriate grasp on an object from natural language instructions poses great difficulties for such systems. The diversity, imprecision, and limited information contained in the language make this task particularly challenging. However, speech serves humans as a natural communication interface that can aid robots in adapting to the environment more easily. Therefore, providing robots with relevant data about the objects they interact with is essential for them to understand how to carry out object manipulation tasks. By leveraging Named Entity Recognition (NER) to automatically extract semantic data, our work introduces a novel approach to text-based grasp predictions. Our methodology involves a multistage learning approach using a semantic information extractor that provides significant features to a grasp prediction model. To assess the effectiveness of our approach, we conducted experiments on an existing corpus and two corpora generated by ChatGPT. Our results demonstrate superior performance compared to similar grasp prediction models while overcoming limitations in the literature. Additionally, we open-source our training data for reproducibility and future research advancement.