Modular Method for Embodied Instruction Following with Environmental Context Adaptation

Zhuoqun Xu; Liubo Ouyang; Yang Liu; Li Zhang

Modular Method for Embodied Instruction Following with Environmental Context Adaptation

Zhuoqun Xu, Liubo Ouyang, Yang Liu, Li Zhang

Published: 01 Jan 2024, Last Modified: 18 May 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Embodied instruction following (EIF) is a challenging task in Embodied AI that requires robots to possess a range of capabilities, including language understanding, object identification, environmental exploration, action planning, and accurate manipulation. To investigate household robot tasks that are considered meaningful in the near future, the community has already developed a primary solution based on a modular method, but its overall performance is far from human. Completing tasks in unfamiliar and unseen environments presents a significant challenge for intelligent robots. Error analysis of the ALFRED dataset indicates that the modular structure has shortcomings in target comprehension and vision-based interaction. As a result, we propose a post-processing optimization approach, which includes environment information alignment using semantic match and visual interaction enhancement with pose adjustment. Our paradigm is evaluated on two interactive datasets, and the performance is improved by an average of 37.12% (relative) compared to the baseline, suggesting that context information in the current environment effectively increases the adaptability of robots.

Loading