A Strong Vision Transformer Adapter with Adaptive Thresholding for fine-Grained Building ClassificationDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 13 Nov 2023IGARSS 2023Readers: Everyone
Abstract: Fine-grained building classification provides a solid basis for the comparison of city morphologies and the investigation of urban planning. To this aim, the DFC23 establishes a large-scale and multi-modal benchmark for the classification of building roof types. However, the problems of long-tailed distribution, data insufficient, inter-class similarity, and intra-class difference severely inhibit the performance of the detector. In this work, we build a strong vision transformer adapter fine-tuned on the cropped building instances to enhance the capacity of feature extraction and design a cross-modal fusion (CMF) module to effectively aggregate features from RGB and SAR data. When transferring to building instance segmentation, we construct a robust training pipeline and a two-stage test-time results ensemble scheme. Furthermore, we introduce self-training with two key denoising techniques, global average filtering (GAF) and intra-class adaptive thresholding (IAT), to boost the generalization of the model. Experimental results show the effectiveness of our method, ranking 2nd in the test phase of the contest.
0 Replies

Loading