Abstract: Global contextual information needs to be modeled precisely for accurate segmentation of images taken by Unmanned Aerial Vehicles (UAVs). This paper presents a transformer-based method for UAV street scene semantic segmentation. The method uses an encoder-decoder-based architecture to capture local and global context information in UAV images. Experimental result of the proposed method shows competitive performance against state-of-the-art methods by achieving mIoU of 61.93% on UAVid dataset.
0 Replies
Loading