CrayonRobo: Toward Generic Robot Manipulation via Crayon Visual Prompting

Xiaoqi Li; Lingyun Xu; Jiaming Liu; Mingxu Zhang; Jiahui Xu; Siyuan Huang; Iaroslav Ponomarenko; Yan Shen; Shanghang Zhang; Hao Dong

CrayonRobo: Toward Generic Robot Manipulation via Crayon Visual Prompting

Xiaoqi Li, Lingyun Xu, Jiaming Liu, Mingxu Zhang, Jiahui Xu, Siyuan Huang, Iaroslav Ponomarenko, Yan Shen, Shanghang Zhang, Hao Dong

18 Sept 2024 (modified: 14 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Robotic manipulation

Abstract: In robotic manipulation, there are several ways to convey the task goal, including language conditions, goal images, and goal videos. However, natural language can be ambiguous, and images or videos can be over-specified. To address this issue, we propose an innovative approach using a straightforward and practical representation: crayon visual prompts, which explicitly indicate both low-level actions and high-level planning. Specifically, for each atomic step, our method allows drawing simple yet expressive 2D visual prompts on RGB images to represent the required actions, i.e., end-effector pose and moving direction. We devise a training strategy that enables the model to comprehend each color prompt and predict the contact pose along with the movement direction in SE(3) space. Furthermore, we design an interaction strategy that leverages the predicted movement direction to form a trajectory connecting the sequence of atomic steps, thereby completing the long-horizon task. Through introducing simple human drawn prompts or automatically generated alternatives, we enable the model to explicitly understand its task objective and boost its generalization ability on unseen tasks by providing model-understandable crayon visual prompts. We evaluate our method in both simulation and real-world environments, demonstrating its promising performance.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1474

Loading