Abstract: In remote sensing, there exists a common need for learning scale invariant shapes of objects like buildings. Prior works rely on tweaking multiple loss functions to convert segmentation maps into the final vectorised representation, necessitating arduous design and optimisation. For this purpose, we introduce the GeoFormer, a novel architecture that presents a remedy to the said challenges, learning to generate multi-polygons end-to-end. By modelling keypoints as spatially dependent tokens in an auto-regressive manner, the GeoFormer outperforms existing works in delineating building objects from satellite imagery. We evaluate the robustness of the GeoFormer against former methods through a variety of parameter ablations and highlight the advantages of optimising a single likelihood function. Our study presents the first successful application of auto-regressive transformer models for multi-polygon predictions in remote sensing, suggesting a promising methodological alternative for building vectorisation.
Loading