DifFlow3D: Hierarchical Diffusion Models for Uncertainty-Aware 3D Scene Flow Estimation

Jiuming Liu, Weicai Ye, Guangming Wang, Chaokang Jiang, Lei Pan, Jinru Han, Zhe Liu, Guofeng Zhang, Hesheng Wang

Published: 2026, Last Modified: 18 Mar 2026IEEE Trans. Pattern Anal. Mach. Intell. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: 3D scene flow represents the dense per-point motion field in dynamic scenes, playing a crucial role in various downstream tasks, including motion segmentation, dynamic scene reconstruction, 4D content generation, etc. However, previous regression-based works commonly suffer from unreliable correlations caused by locally constrained search ranges and struggle with the absence of timely feedback regarding the flow estimation uncertainty during training. To address these challenges, we propose a novel uncertainty-aware network for scene flow estimation, termed DifFlow3D, based on the conditional probabilistic diffusion model. Hierarchical diffusion-based flow estimation blocks are designed to enhance the correlation robustness and resilience to challenging cases, e.g., dynamics, noisy inputs, repetitive patterns, etc. To mitigate the generation diversity, three key flow-related features are leveraged as conditions in our diffusion model. Furthermore, we develop an uncertainty estimation module within diffusion to assess the reliability of estimated scene flow dynamically. A Hidden State Denoising strategy (HSD) is also introduced to further boost the stability of the reverse denoising process. Extensive experiments conducted on four scene flow datasets, including both synthetic and real-world datasets (FlyingThings3D, KITTI 2015, Argoverse, and Waymo Open), demonstrate the superiority of our proposed DifFlow3D. Compared to prior state-of-the-art methods, DifFlow3D has 26.0%, 36.4%, 35.3%, and 17.7% EPE3D reduction respectively across four datasets. Only trained on the synthetic FlyingThings3D dataset, our method achieves an unprecedented millimeter-level accuracy (0.0070 m EPE3D) on the real-scene KITTI dataset, highlighting its exceptional generalization capability. Additionally, our diffusion-based refinement paradigm can be seamlessly integrated as a plug-and-play module into existing scene flow networks, significantly enhancing their estimation accuracy. We also introduce our pre-trained scene flow estimator as explicit motion priors into the novel dynamic LiDAR view synthesis task, which validates its great potential for improving the 4D LiDAR reconstruction performance.

External IDs:dblp:journals/pami/LiuYWJPHLZW26