AAFormer: Attention-Attended Transformer for Semantic Segmentation of Remote Sensing Images

Published: 01 Jan 2024, Last Modified: 13 Nov 2024IEEE Geosci. Remote. Sens. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The rapid advancements in remote sensing technology have enabled the widespread availability of fine-resolution remote sensing images (RSIs), offering rich spatial details and semantics. Despite the applicability and scalability of transformers in semantic segmentation of RSIs by learning pairwise contextual affinity, they inevitably introduce irrelevant context, hindering accurate inference of patch semantics. To address this, we propose a novel multihead attention-attended module (AAM) that refines the multihead self-attention mechanism (AM). The AAM filters out irrelevant context while highlighting informative ones by considering the relevance between self-attention maps and the query vector. The AAM generates an attention gate to complement contextual affinity and emphasize the useful ones with a higher weight simultaneously. Leveraging multihead AAM as the core unit, we construct a lightweight attention-attended transformer block (ATB). Subsequently, we devise AAFormer, a pure transformer with a mask transformer decoder, for achieving semantic segmentation of RSIs. We extensively evaluate our approach on the ISPRS Potsdam and LoveDA datasets, demonstrating compelling performance compared to mainstream methods. Additionally, we conduct evaluations to analyze the effects of AAM.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview