Abstract: Six-degree-of-freedom (6D) pose estimation from RGB-D data remains a challenging task, particularly due to the suboptimal fusion of multimodal features and the effective utilization of depth information. Most of the existing methods often struggle with misalignment between features from different modalities and neglect the importance of multiscale context. To address these issues, we propose a novel framework, pyramid network (PYN), that leverages multiscale feature fusion and a novel multihedron dynamic graph convolutional network (MHDGCN). PYN effectively aligns multimodal features across different semantic levels, while MHDGCN efficiently extracts features from irregular depth data, especially in low-quality scenarios. Extensive experiments demonstrate the superiority of our approach over state-of-the-art methods, highlighting its robustness and accuracy in challenging 6D pose estimation tasks. Please refer to https://github.com/ZEROhands/MSF-MHDGCN-Pose for details.
External IDs:doi:10.1109/tim.2025.3566827
Loading