Self-Supervised Monocular Scene Decomposition and Depth Estimation

Sadra Safadoust, Fatma Güney

Published: 2021, Last Modified: 26 Feb 2026CoRR 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Self-supervised monocular depth estimation approaches either ignore independently moving objects in the scene or need a separate segmentation step to identify them. We propose MonoDepthSeg to jointly estimate depth and segment moving objects from monocular video without using any ground-truth labels. We decompose the scene into a fixed number of components where each component corresponds to a region on the image with its own transformation matrix representing its motion. We estimate both the mask and the motion of each component efficiently with a shared encoder. We evaluate our method on three driving datasets and show that our model clearly improves depth estimation while decomposing the scene into separately moving components.
Loading