Unpaved road segmentation of UAV imagery via a global vision transformer with dilated cross window self-attention for dynamic map

Published: 01 Jan 2025, Last Modified: 12 Apr 2025Vis. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Road segmentation is a fundamental task for dynamic map in unmanned aerial vehicle (UAV) path navigation. In unplanned, unknown and even damaged areas, there are usually unpaved roads with blurred edges, deformations and occlusions. These challenges of unpaved road segmentation pose significant challenges to the construction of dynamic maps. Our major contributions have: (1) Inspired by dilated convolution, we propose dilated cross window self-attention (DCWin-Attention), which is composed of a dilated cross window mechanism and a pixel regional module. Our goal is to model the long-range horizontal and vertical road dependencies for unpaved roads with deformation and blurred edges. (2) A shifted cross window mechanism is introduced through coupling with DCWin-Attention to reduce the influence of occluded roads in UAV imagery. In detail, the GVT backbone is constructed by using the DCWin-Attention block for multilevel deep features with global dependency. (3) The unpaved road is segmented with the confidence map generated by fusing the deep features of different levels in a unified perceptual parsing network. We verify our method on the self-established BJUT-URD dataset and public DeepGlobe dataset, which achieves 67.72 and 52.67% of the highest IoU at proper inference efficiencies of 2.7, 2.8 FPS, respectively, demonstrating its effectiveness and superiority in unpaved road segmentation. Our code is available at https://github.com/BJUT-AIVBD/GVT-URS.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview