Vision-Based Grasping through Goal-Conditioned Masking

HUIYI WANG; Fahim Shahriar; Gautham Vasan; Seyed Alireza Azimi; A. Rupam Mahmood; Colin Bellinger

Vision-Based Grasping through Goal-Conditioned Masking

HUIYI WANG, Fahim Shahriar, Gautham Vasan, Seyed Alireza Azimi, A. Rupam Mahmood, Colin Bellinger

28 Sept 2024 (modified: 17 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Goal-Conditioned Reinforcement Learning, Robotic Reaching and Grasping, Masking-Based Goal Representation, Visual Goal Recognition, Out-of-Distribution Object Generalization

Abstract: Goal-Conditioned Reinforcement Learning for robotic reaching and grasping has enabled agents to achieve diverse objectives with a unified policy, leveraging goal conditioning such as images, vectors, and text. The existing methods, however, carry inherent limitations; for example, vector-based one-hot encodings allow only a predetermined object set. Meanwhile, goal state images in image-based goal conditioning can be hard to obtain in the real world and may limit generalization to novel objects. This paper introduces a mask-based goal conditioning method that offers object-agnostic visual cues to promote efficient feature sharing and robust generalization. The agent receives text-based goal directives and utilizes a pre-trained object detection model to generate a mask for goal conditioning and facilitate generalization to out-of-distribution objects. In addition, we show that the mask can enhance sample efficiency by augmenting sparse rewards without needing privileged information of the target location, unlike distance-based reward shaping. The effectiveness of the proposed framework is demonstrated in a simulated reach-and-grasp task. The mask-based goal conditioning consistently maintains a $\sim$90\% success rate in grasping both in and out-of-distribution objects. Furthermore, the results show that the mask-augmented reward facilitates a learning speed and grasping success rate on par with distance-based reward.

Primary Area: applications to robotics, autonomy, planning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13353

Loading