Repurposing Marigold for Zero-Shot Metric Depth Estimation via Defocus Blur Cues

Chinmay Talegaonkar; Nikhil Gandudi Suresh; Zachary Novack; Yash Belhe; Priyanka Nagasamudra; Nicholas Antipa

Repurposing Marigold for Zero-Shot Metric Depth Estimation via Defocus Blur Cues

Chinmay Talegaonkar, Nikhil Gandudi Suresh, Zachary Novack, Yash Belhe, Priyanka Nagasamudra, Nicholas Antipa

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Monocular Depth, Defocus Blur, Diffusion Models, Zero Shot Depth, Inference Time Optimization

TL;DR: Zero Shot Metric depth from a pre-trained monocular depth diffusion model by incorporating optical defocus cues at Inference time. Outperforms some of the existing Zero Shot Metric depth baselines, while also

Abstract: Recent monocular metric depth estimation (MMDE) methods have made notable progress towards zero-shot generalization. However, they still exhibit a significant performance drop on out-of-distribution datasets. We address this limitation by injecting defocus blur cues at inference time into Marigold, a \textit{pre-trained} diffusion model for zero-shot, scale-invariant monocular depth estimation (MDE). Our method effectively turns Marigold into a metric depth predictor in a training-free manner. To incorporate defocus cues, we capture two images with a small and a large aperture from the same viewpoint. To recover metric depth, we then optimize the metric depth scaling parameters and the noise latents of Marigold at inference time using gradients from a loss function based on the defocus-blur image formation model. We compare our method against existing state-of-the-art zero-shot MMDE methods on a self-collected real dataset, showing quantitative and qualitative improvements.

Supplementary Material: zip

Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)

Submission Number: 14004

Loading