Ground4Act: Leveraging visual-language model for collaborative pushing and grasping in clutter

Yuxiang Yang, Jiangtao Guo, Zilong Li, Zhiwei He, Jing Zhang

Published: 2024, Last Modified: 13 Nov 2024Image Vis. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•Ground4Act is developed via visual grounding for target-oriented tasks in clutter.•DQN-based policy achieves pushing of non-target objects to make the target graspable.•Push and grasp in a uniform format for common deployment from simulation to reality.