From Language to Action: Employing Foundation Models in Autonomous Robots

ACL ARR 2024 June Submission928 Authors

13 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Foundation models have demonstrated remarkable capabilities in natural language processing tasks, generating interest in their potential for robotic applications. However, the existing literature lacks a transparent and comprehensive synthesis of these advancements. This paper utilizes the PRISMA framework to systematically review and explore the integration of foundation models in robotic applications. Through an in-depth analysis of 76 studies, we investigate current trends in models, modalities, and experimental methods. Additionally, this study maps the state-of-the-art applications of foundation models in robotics tasks, and illustrate how these tasks are interconnected. Synthesizing these findings, we identified key challenges and future direction. This study establishes a benchmark and offers insights into future research directions for developing safe and autonomous embodied foundation models. All data, and findings are available on the project repository.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: Vision language navigation, Multimodality, Cross-modal pretraining, Cross-modal application,
Contribution Types: Surveys
Languages Studied: English
Submission Number: 928