Keywords: indoor panoramic semantic segmentation, vertical relative distance
TL;DR: We propose a new approach for Indoor Panoramic Semantic Segmentation
Abstract: PAnoramic Semantic Segmentation (PASS) is an important task in computer vision,
as it enables semantic understanding of a 360° environment. Currently,
most of existing works have focused on addressing the distortion issues in 2D
panoramic images without considering spatial properties of indoor scene. This
restricts PASS methods in perceiving contextual attributes to deal with the ambiguity
when working with monocular images. In this paper, we propose a novel
approach for indoor panoramic semantic segmentation. Unlike previous works,
we consider the panoramic image as a composition of segment groups: oversampled
segments, representing planar structures such as floors and ceilings, and
under-sampled segments, representing other scene elements. To optimize each
group, we first enhance over-sampled segments by jointly optimizing with a dense
depth estimation task. Then, we introduce a transformer-based context module
that aggregates different geometric representations of the scene, combined
with a simple high-resolution branch, it serves as a robust hybrid decoder for
estimating under-sampled segments, effectively preserving the resolution of predicted
masks while leveraging various indoor geometric properties. Experimental
results on both real-world (Stanford2D3DS, Matterport3D) and synthetic (Structured3D)
datasets demonstrate the robustness of our framework, by setting new
state-of-the-arts in almost evaluations, The code and updated results are available
at: https://github.com/caodinhduc/vertical_relative_distance.
Primary Area: Machine vision
Submission Number: 14983
Loading