Abstract: Direct pose estimation networks aim to directly regress the 6D poses of target objects in the scene image using a neural network. These direct methods offer efficiency and an optimal optimization target, presenting significant potential for practical applications. However, due to the complex and implicit mappings between input features and target pose parameters, direct methods are challenging to train and prone to overfitting on mappings seen during training, resulting in limited effectiveness and generalization capability on unseen mappings. Existing methods focus primarily on improvements of the network architecture and training strategies, with less attention given to mappings. In this work, we propose a geometric constraints learning approach, which enables networks to explicitly capture and utilize the geometric mappings between inputs and optimization targets for pose estimation. Specifically, we introduce a residual pose transformation formula that preserves pose transformation constraints within both the 2D image plane and the 3D space while decoupling the absolute pose distribution, thereby addressing the pose distribution gap issue. We further design a Geo6D mechanism based on the formula, which enables the network to explicitly utilize geometric constraints for pose estimation by reconstructing the inputs and outputs. We select two different methods as our baseline and extensive experiments show that Geo6D enhances the performance and reduces the dependence on extensive training data, remaining effective even with only 10% of the typical data volume.
External IDs:dblp:journals/tmm/ChenSZBHLJZWJ25
Loading