PLOT: Human-Like Push-Grasping Synergy Learning in Clutter With One-Shot Target Recognition

Xiaoge Cao, Tao Lu, Liming Zheng, Yinghao Cai, Shuo Wang

Published: 2024, Last Modified: 05 Oct 2024IEEE Trans. Cogn. Dev. Syst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In unstructured environments, robotic grasping tasks are frequently required to interactively search for and retrieve specific objects from a cluttered workspace under the condition that only partial information about the target is available, like images, text descriptions, 3-D models, etc. It is a great challenge to correctly recognize the targets with limited information and learn synergies between different action primitives to grasp the targets from densely occluding objects efficiently. In this article, we propose a novel human-like push-grasping method that could grasp unknown objects in clutter using only one target RGB with Depth (RGB-D) image, called push-grasping synergy learning in clutter with one-shot target recognition (PLOT). First, we propose a target recognition (TR) method which automatically segments the objects both from the query image and workspace image, and extract the robust features of each segmented object. Through the designed feature matching criterion, the targets could be quickly located in the workspace. Second, we introduce a self-supervised target-oriented grasping system based on synergies between push and grasp actions. In this system, we propose a salient Q (SQ)-learning framework that focuses the Q value learning in the area including targets and a coordination mechanism (CM) that selects the proper actions to search and isolate the targets from the surrounding objects, even in the condition of targets invisible. Our method is inspired by the working memory mechanism of human brain and can grasp any target object shown through the image and has good generality in application. Experimental results in simulation and real-world show that our method achieved the best performance compared with the baselines in finding the unknown target objects from the cluttered environment with only one demonstrated target RGB-D image and had the high efficiency of grasping under the synergies of push and grasp actions.