Smaller and Faster Robotic Grasp Detection Model via Knowledge Distillation and Unequal Feature Encoding

Hong Nie, Zhou Zhao, Lu Chen, Zhenyu Lu, Zhuomao Li, Jing Yang

Published: 2024, Last Modified: 05 Nov 2025IEEE Robotics Autom. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In order to achieve higher accuracy, the complexity of grasp detection network increases accordingly with complicated model structures and tremendous parameters. Although various light-weight strategies are adopted, directly designing the compact network can be sub-optimal and difficult to strike the balance between accuracy and model size. To solve this problem, we explore a more efficient grasp detection model from two aspects: elaborately designing a light-weight network and performing knowledge distillation on the designed network. Specifically, based on the designed light-weight backbone, the features from RGB and D images with unequal effective grasping information rates are fully utilized and the information compensation strategies are adopted to make the model small enough while maintaining its accuracy. Then, the grasping features contained in the large teacher model are adaptively and effectively learned by our proposed method via knowledge distillation. Experimental results indicate that the proposed method is able to achieve comparable performance (98.9%, 93.1%, 82.3%, and 90.0% on Cornell, Jacquard, GraspNet, and MultiObj datasets, respectively) with more complicate models while reducing the parameters from MBs to KBs. Real-world robotic grasping experiment in an embedded AI computing device also prove the effectiveness of this approach.