Multiview Feature Aggregation for Facade Parsing

Wenguang Ma, Shibiao Xu, Wei Ma, Hongbin Zha

Published: 2022, Last Modified: 27 Jan 2026IEEE Geosci. Remote. Sens. Lett. 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Facade image parsing is essential to the semantic understanding and 3-D reconstruction of urban scenes. Considering the occlusion and appearance ambiguity in single-view images and the easy acquisition of multiple views, in this letter, we propose a multiview enhanced deep architecture for facade parsing. The highlight of this architecture is a cross-view feature aggregation module that can learn to choose and fuse useful convolutional neural network (CNN) features from nearby views to enhance the representation of a target view. Benefitting from the multiview enhanced representation, the proposed architecture can better deal with the ambiguity and occlusion issues. Moreover, our cross-view feature aggregation module can be straightforwardly integrated into existing single-image parsing frameworks. Extensive comparison experiments and ablation studies are conducted to demonstrate the good performance of the proposed method and the validity and transportability of the cross-view feature aggregation module.