The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 oralEveryoneRevisionsBibTeX
Keywords: Monocular depth, optical flow, diffusion, depth, flow
TL;DR: Advances in denoising diffusion to handle limited, noisy, incomplete labels of dense vision tasks, specifically monocular depth estimation and optical flow, achieving sota results
Abstract: Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly without task-specific architectures and loss functions that are predominant for these tasks. Compared to the point estimates of conventional regression-based methods, diffusion models also enable Monte Carlo inference, e.g., capturing uncertainty and ambiguity in flow and depth. With self-supervised pre-training, the combined use of synthetic and real data for supervised training, and technical innovations (infilling and step-unrolled denoising diffusion training) to handle noisy-incomplete training data, one can train state-of-the-art diffusion models for depth and optical flow estimation, with additional zero-shot coarse-to-fine refinement for high resolution estimates. Extensive experiments focus on quantitative performance against benchmarks, ablations, and the model's ability to capture uncertainty and multimodality, and impute missing values. Our model obtains a state-of-the-art relative depth error of 0.074 on the indoor NYU benchmark and an Fl-all score of 3.26\% on the KITTI optical flow benchmark, about 25\% better than the best published method.
Submission Number: 14395