Leveraging Multiscale Features and Dynamic Graph Convolution for Robust 6D Pose Estimation

Haodu Zhang, Huafeng Wang, Wanquan Liu, Kexin Guo, Weifeng Lv, Jun Zhang

Published: 01 Jan 2025, Last Modified: 05 Nov 2025IEEE Transactions on Instrumentation and MeasurementEveryoneRevisionsCC BY-SA 4.0

Abstract: Six-degree-of-freedom (6D) pose estimation from RGB-D data remains a challenging task, particularly due to the suboptimal fusion of multimodal features and the effective utilization of depth information. Most of the existing methods often struggle with misalignment between features from different modalities and neglect the importance of multiscale context. To address these issues, we propose a novel framework, pyramid network (PYN), that leverages multiscale feature fusion and a novel multihedron dynamic graph convolutional network (MHDGCN). PYN effectively aligns multimodal features across different semantic levels, while MHDGCN efficiently extracts features from irregular depth data, especially in low-quality scenarios. Extensive experiments demonstrate the superiority of our approach over state-of-the-art methods, highlighting its robustness and accuracy in challenging 6D pose estimation tasks. Please refer to https://github.com/ZEROhands/MSF-MHDGCN-Pose for details.

External IDs:doi:10.1109/tim.2025.3566827