Keywords: Multi-task learning, monocular depth estimation, semantic segmentation, pseudo label, cross-view consistency
Abstract: Multi-task learning (MTL) for scene understanding has been actively studied by exploiting correlation of multiple tasks. This work focuses on improving the performance of the MTL network that infers depth and semantic segmentation maps from a single image. Specifically, we propose a novel MTL architecture, called Pseudo-MTL, that introduces pseudo labels for joint learning of monocular depth estimation and semantic segmentation tasks. The pseudo ground truth depth maps, generated from pretrained stereo matching methods, are leveraged to supervise the monocular depth estimation. More importantly, the pseudo depth labels serve to impose a cross-view consistency on the estimated monocular depth and segmentation maps of two views. This enables for mitigating the mismatch problem incurred by inconsistent prediction results across two views. A thorough ablation study validates that the cross-view consistency leads to a substantial performance gain by ensuring inference-view invariance for the two tasks.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: This paper proposes a novel multi-task learning (MTL) architecture, called Pseudo-MTL, that leverages pseudo labels for joint learning of monocular depth estimation and semantic segmentation tasks.
Reviewed Version (pdf): https://openreview.net/references/pdf?id=PAtmMy1zuW
7 Replies
Loading