Geometric Exploitation for Indoor Panoramic Semantic Segmentation

Duc Cao Dinh; Seok Joon Kim; Kyusung Cho

Geometric Exploitation for Indoor Panoramic Semantic Segmentation

Duc Cao Dinh, Seok Joon Kim, Kyusung Cho

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: indoor panoramic semantic segmentation, vertical relative distance

TL;DR: We propose a new approach for Indoor Panoramic Semantic Segmentation

Abstract: PAnoramic Semantic Segmentation (PASS) is an important task in computer vision, as it enables semantic understanding of a 360° environment. Currently, most of existing works have focused on addressing the distortion issues in 2D panoramic images without considering spatial properties of indoor scene. This restricts PASS methods in perceiving contextual attributes to deal with the ambiguity when working with monocular images. In this paper, we propose a novel approach for indoor panoramic semantic segmentation. Unlike previous works, we consider the panoramic image as a composition of segment groups: oversampled segments, representing planar structures such as floors and ceilings, and under-sampled segments, representing other scene elements. To optimize each group, we first enhance over-sampled segments by jointly optimizing with a dense depth estimation task. Then, we introduce a transformer-based context module that aggregates different geometric representations of the scene, combined with a simple high-resolution branch, it serves as a robust hybrid decoder for estimating under-sampled segments, effectively preserving the resolution of predicted masks while leveraging various indoor geometric properties. Experimental results on both real-world (Stanford2D3DS, Matterport3D) and synthetic (Structured3D) datasets demonstrate the robustness of our framework, by setting new state-of-the-arts in almost evaluations, The code and updated results are available at: https://github.com/caodinhduc/vertical_relative_distance.

Primary Area: Machine vision

Submission Number: 14983

Loading