Stereoential Net: Deep Network for Learning Building Height Using Stereo Imagery

Published: 01 Jan 2023, Last Modified: 05 Jun 2025ICONIP (13) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Height estimation plays a crucial role in the planning and assessment of urban development, enabling effective decision-making and evaluation of urban built areas. Accurate estimation of building heights from remote sensing optical imagery poses significant challenges in preserving both the overall structure of complex scenes and the elevation details of the buildings. This paper proposes a novel end-to-end deep learning-based network (Stereoential Net) comprising a multi-scale differential shortcut connection module (MSDSCM) at the decoding end and a modified stereo U-Net (mSUNet). The proposed Stereoential network performs a multi-scale differential decoding features fusion to preserve fine details for improved height estimation using stereo optical imagery. Unlike existing methods, our approach does not use any multi-spectral satellite imagery, instead, it only employs freely available optical imagery, yet it achieves superior performance. We evaluate our proposed network on two benchmark datasets, the IEEE Data Fusion Contest 2018 (DFC2018) dataset and the 42-cities dataset. The 42-cities dataset is comprised of 42 different densely populated cities of China having diverse sets of buildings with varying shapes and sizes. The quantitative and qualitative results reveal that our proposed network outperforms the SOTA algorithms for DFC2018. Our method reduces the root-mean-square error (RMSE) by 0.31 m as compared to state-of-the-art multi-spectral approaches on the 42-cities dataset. The code will be made publically available via the GitHub repository.
Loading