DFGAP: Towards Depth-Free Cross-Category GAParts Perception via Uncertainty-Quantified Modeling

Xueyu Yuan, Jiarui Zhang, Jiangqi Song, Liu Liu, Li Zhang, Dan Guo, Richang Hong, Meng Wang

Published: 29 Oct 2025, Last Modified: 28 Jan 2026ACM MM 2025EveryoneCC BY 4.0

Abstract: Cross-category object perception is one of the essential upstream tasks for generelizable robot object interaction and manipulation. Recently, an increasing number of researchers are focus on investigating visual Generalizable and Actionable Parts understanding at cross-category level perception. However, these works are built upon the RGB-D or point cloud input, that relies on the depth information capture. Under the circumstances of limited depth camera performance, e.g. transparent or light absorbing material, perception algorithms that do not require depth information are urgently needed. In this paper, we propose DFGAP, a novel depth-free framework for RGB-based GAParts segmentation and pose estimation. Specifically, we independently modeling the ill-pose problems from the absence of depth for GAPart segmentation and pose estimation, by clearly quantify the pixel-wise segmentation probability and relative depth. We reduce the uncertainty and benefit learning in these two task. The experimental results demonstrate the superior performance and robustness of our DFGAP. Our work provides a new research paradigm in GAParts perception. We believe that our work has the enormous potential to be applied in many areas of embodied AI system.