Reinforcement Learning-Based Grasping via One-Shot Affordance Localization and Zero-Shot Contrastive Language-Image Learning

Published: 10 Jan 2024, Last Modified: 02 Feb 20262024 IEEE/SICE International Symposium on System Integration (SII)EveryoneCC BY 4.0
Abstract: We present a novel robotic grasping system using a caging-style gripper, that combines one-shot affordance localization and zero-shot object identification. We demonstrate an integrated system requiring minimal prior knowledge, focusing on flexible few-shot object agnostic approaches. For grasping a novel target object, we use as input the color and depth of the scene, an image of an object affordance similar to the target object, and an up to three-word text prompt describing the target object. We demonstrate the system using real-world grasping of objects from the YCB benchmark set, with four distractor objects cluttering the scene. Overall, our pipeline has a success rate of the affordance localization of 96%, object identification of 62.5%, and grasping of 72%. Videos are on the project website: https://sites.google.com/view/rl-affcorrs-grasp.
Loading