Improved RGB-D Indoor Semantic Segmentation using Cascaded Loss Fusion

Sonali Patil, Anita Sellent, Andreas Gerndt, Georgia Albuquerque

Published: 17 Jan 2024, Last Modified: 31 Jul 2024OpenReview Archive Direct UploadEveryoneCC BY-NC-ND 4.0

Abstract: Semantic segmentation of images promises numerous benefits for augmented reality applications. However, in such applications typical scenes are challenging for current segmentation algorithms due to high variability in object appearances and distribution. We propose a new cascaded loss fusion strategy to improve the training schedule of state-of-the-art realtime RGB-D semantic segmentation architectures. We employ methods developed in the context of multi-task learning to solve the multi-class and multi-loss learning problems in semantic segmentation. Through our quantitative evaluation on the NYUv2 [3] and SUNRGB-D [4] benchmark datasets, we show improvement over the state-of-the-art approaches. Furthermore, our approach improves results qualitatively on both the benchmark datasets as well as on our own recordings of some scenarios that are typical for head-mounted cameras.