Keywords: Autonomous driving, Visual Odometry, Deep Learning
Abstract: Learning-based visual odometry remains vulnerable to outliers, particularly dynamic objects and background regions irrelevant to motion. It also suffers from weak modeling of inter-frame geometry, that is, the explicit pixel-wise correspondences that provide reliable motion cues. Many methods still rely on frame concatenation followed by convolutions, which tends to overlook fine-grained pixel correspondences across frames. This paper presents FGD-VO, an end-to-end monocular VO framework that addresses these issues within a unified architecture. First, we introduce a Flow-Guided Deformable Correlation (FGDC) module, which leverages dense optical flow to locate pixel correspondences between consecutive frames and augments them with learnable local offsets, enabling the aggregation of geometrically relevant features beyond fixed pixel matches. Second, we propose a Hybrid Masking strategy that combines an explicit optical-flow consistency mask with an implicit, learnable attention mask, allowing the network to simultaneously suppress unreliable correspondences and adaptively emphasize informative regions during pose refinement. Extensive experiments on the KITTI odometry benchmark demonstrate that FGD-VO achieves state-of-the-art accuracy among learning-based VO methods, significantly reducing both translational and rotational errors. Our findings suggest that explicitly coupling flow-guided deformable correlation with hybrid masking is a promising direction for improving the reliability and generalization of real-time visual odometry in autonomous systems. We will release our source code to facilitate reproducibility and future research.
Primary Area: applications to robotics, autonomy, planning
Supplementary Material: zip
Submission Number: 3507
Loading