Vectorizing Planar Roof Structure From Very High Resolution Remote Sensing Images Using Transformers
Abstract: Grasping the roof structure of a building is a key part of building reconstruction. Directly predicting the geometric structure of the roof from a raster image to a vectorized representation, however, remains challenging. This paper introduces an efficient and accurate parsing method based upon a vision Transformer we dubbed Roof-Former. Our method consists of three steps: 1) Image encoder and edge node initialization, 2) Image feature fusion with an enhanced segmentation refinement branch, and 3) Edge filtering and structural reasoning. The vertex and edge heat map F1-scores have increased by 2.0% and 1.9% on the VWB dataset when compared to HEAT. Additionally, qualitative evaluations suggest that our method is superior to the current state-of-the-art. It indicates effectiveness for extracting global image information and maintaining the consistency and topological validity of the roof structure.
Loading