[Re] Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution MergingDownload PDF

Anonymous

05 Feb 2022 (modified: 05 May 2023)ML Reproducibility Challenge 2021 Fall Blind SubmissionReaders: Everyone
Keywords: monocular depth estimation, local boosting, high resolution, merging network, double estimation, multi-megapixel images
Abstract: Scope of Reproducibility The authors propose a method to improve monocular depth estimations on multi-megapixel images with existing depth estimation models by merging estimates from lower and higher resolutions. Low-resolution estimates have better structural consistency and lack details, whereas higher resolution images have more details but produce artifacts. Those estimations are merged to an improved base estimate using an image-to-image translation network. The base estimate is further enhanced with local boosting. We aim to reproduce the desired effects on low and high resolution depth maps and verify that merging and boosting improve the accuracy of the final estimate by using the same as well as other depth estimation models and data as the authors. Methodology We used the code provided by the authors as a baseline and modified it to run the whole pipeline on various depth estimation models. For our benchmark experiments, we considered the same IBIMS-1 dataset as well as another higher resolution dataset not used by the authors, namely DIODE. Our experiments were performed on Google Colab GPU instances (Nvidia Tesla K80) and each benchmark test took several hours, depending on model and dataset. Results Using the author's code, the pre-trained weights on all models, and the same IBIMS-1 dataset, our error metrics were significantly lower, even though we could still see that the proposed method does indeed improve the overall accuracy metrics of higher resolution depth estimations. The authors then provided us with an updated evaluation method, removing some values from the IBIMS-1 ground truth data, which affected the normalization step. With this unmentioned data processing step, the metrics matched the paper almost exactly. We were additionally able to show the improvements visually as well as quantitatively with other models and other data than the originally used ones. What was easy The provided code from the authors is simple to navigate and to understand. We did not see any major contradictions to the published paper. Moreover, we were able to easily extend the whole pipeline to run other depth estimation models. What was difficult The proposed method makes use of multiple inferences per image and with every image having up to 150 patches that need to be estimated and merged into the base estimate, The whole process may therefore be very time-consuming and is computationally expensive overall. Without GPU instances it is not recommended to run the whole pipeline at all. Communication with original authors The authors provided us with an updated evaluation script, which includes using a guided filter for the IBIMS-1 ground truth data to remove some values and to achieve the same metrics.
Paper Url: https://arxiv.org/abs/2105.14021
Paper Venue: CVPR 2021
4 Replies

Loading