M3DOnline: Foundation-Prior Guided Monocular 3D Motion Learning for Autonomous Driving in Novel Scenes

M3DOnline: Foundation-Prior Guided Monocular 3D Motion Learning for Autonomous Driving in Novel Scenes

ICLR 2026 Conference Submission18723 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Normalized Scene Flow; Autonomous driving; 3D Vision

TL;DR: Leverage the beneficial prior knowledge from large models to enhance the weaknesses of existing self-supervised methods.

Abstract: We propose M3DOnline, a learning framework for normalized scene flow (NSF). NSF represents the dense 3D motion of pixels between two frames and plays a critical role in various monocular 3D vision tasks. Existing self-supervised NSF methods heavily rely on strong visual cues, which limits their performance on non-Lambertian surfaces and around motion boundaries. Our key insight is to leverage useful priors from foundation models to overcome the inherent limitations of texture-based matching in traditional self-supervised methods. Specifically, we design a pseudo-label generation pipeline using semantic and depth foundation models. Based on rigid motion assumptions, we divide real-world scenes into semantic segments and generate per-segment 3D motion pseudo-labels. To handle inevitable non-rigid regions and reduce the impact of inaccurate predictions from foundation models, we introduce a loss-based adaptive learning strategy, which filters out obvious non-rigid areas and dynamically adjusts the learning weight and region based on label quality. Experiments show that M3DOnline significantly improves motion boundary estimation and the handling of reflective and transparent surfaces. This demonstrates the advantage of integrating foundation model priors into self-supervised scene flow learning. Code will be available.

Primary Area: applications to robotics, autonomy, planning

Submission Number: 18723

Loading