Abstract: Monocular depth estimation has gained great momentum and achieved growing success recently. Nonetheless, due to the intrinsic difficulty associated with large-scale RGB-D data capture for training purpose and the inefficient utilization of existing training datasets, it is still challenging to accommodate flexibly-changing scenarios. To ameliorate, we propose a fewshot learning method for monocular depth estimation augmented by local object-object relationship. Our method is based on the insight that the depth changing between neighboring objects is relatively stable across diverse but similar scenarios. At the technical front, we first learn the object relationship based on the relative distance between single objects. Towards this goal, we design a CNN architecture to simultaneously encode the object spatial context into object-object relationship features and encode the original image into global context features. Hence we can complementally leverage few-shot dataset with only a few samples for depth estimation while preserving the global depth changing range and respecting the local object-object depth details. As a result, our novel approach could estimate depth from various indoor RGB images, which greatly alleviates the training dataset dependency in monocular depth estimation. Finally, we conduct extensive experiments and comprehensive evaluations on the widely-used public benchmarks, and all the experiments confirm that, our method outperforms the state-of-the-art depth estimation methods, especially for the cases where only smallscale training samples are available.
0 Replies
Loading