Laformer: Vision Transformer for Panoramic Image Semantic Segmentation

Zheng Yuan; Junhua Wang; Yuxin Lv; Ding Wang; Yi Fang

Laformer: Vision Transformer for Panoramic Image Semantic Segmentation

Zheng Yuan, Junhua Wang, Yuxin Lv, Ding Wang, Yi Fang

Published: 01 Jan 2023, Last Modified: 15 May 2024IEEE Signal Process. Lett. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recent years have seen great advances in the area of semantic segmentation. However, general methods are targeted at pinhole images and tend to underperform when directly adopted to panoramic images. And with the wide applications of panoramic cameras, it is important to develop feasible approaches to train segmentation models for their real-time applications. To address this problem, we propose a novel method using self-training and achieve comparable results on DensePASS dataset. Namely, we propose a deformable merge module tailored for panoramic images by efficiently and accurately incorporating features of different levels. We design a novel prototype adaptation term that aids the model to better learn the class-wise feature embeddings of distorted objects. Finally, we use a simple and valid evaluation method to achieve real-time and improved inference performance. All combined, we can reach 58.27% of mIoU scores on DensePASS dataset and achieve new state of the art results.

Loading