Transformer framework for depth-assisted UDA semantic segmentation

Published: 01 Jan 2024, Last Modified: 11 Nov 2024Eng. Appl. Artif. Intell. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Unsupervised domain adaptation (UDA) plays a crucial role in transferring models trained on synthetic datasets to real-world datasets. In semantic segmentation, UDA can alleviate the requirement of a large number of dense semantic annotations. Some UDA semantic segmentation approaches have already leveraged depth information to enhance semantic features for improved segmentation accuracy. Building on this, we introduce a UDA multitask Transformer framework called Multi-former. Multi-former contains a semantic-segmentation and a depth-estimation network. Depth-estimation network extracts more informative depth features to estimate depth and assist in semantic segmentation. In addition, considering the issue of imbalanced class pixel distributions in the source domain, we present a rare class mix strategy (RCM) to balance domain adaptability for all classes. To further enhance the UDA semantic segmentation performance, we design a mixed label loss weight strategy (MLW), which employs different types of weights to comprehensively utilize the features of pseudo-label. Experimental results demonstrate the effectiveness of the proposed approach, which achieves the best mean intersection over union (mIoU) of 56.1% and 76.3% on the two UDA benchmark tasks of synthetic datasets to real-world datasets, respectively. The code and models are available at https://github.com/fz-ss/Multi-former.
Loading