Incorporating Task Progress Knowledge for Subgoal Generation in Robotic Manipulation through Image Edits

Xuhui Kang; Yen-Ling Kuo

Incorporating Task Progress Knowledge for Subgoal Generation in Robotic Manipulation through Image Edits

Xuhui Kang, Yen-Ling Kuo

Published: 01 Jan 2025, Last Modified: 13 May 2025WACV 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Understanding the progress of a task allows humans to not only track what has been done but also to better plan for future goals. We demonstrate TaKSIE, a novel framework that incorporates task progress knowledge into visual sub-goal generation for robotic manipulation tasks. We jointly train a recurrent network with a latent diffusion model to generate the next visual subgoal based on the robot's cur-rent observation and the input language command. At exe-cution time, the robot leverages a visual progress represen-tation to monitor the task progress and adaptively samples the next visual subgoal from the model to guide the manip-ulation policy. We train and validate our model in simu-lated and real-world robotic tasks, achieving state-of-the-art performance on the CALVIN manipulation benchmark. We find that the inclusion of task progress knowledge can improve the robustness of trained policy for different initial robot poses or various movement speeds during demonstrations. The project page is available at https://live-robotics-uva.github.io/TaKSIE/.

Loading