ESAN: An Efficient Semantic Attention Network for Remote Sensing Image Change Captioning

ACL ARR 2024 June Submission2228 Authors

15 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: With the continuous progress of remote sensing technology, an increasing number of remote sensing images containing rich geographical and environmental information is obtained. Unlike natural images, remote sensing images usually cover a large area and have complex spatial distribution, making it a challenge to accurately extract and describe changes from images. In order to effectively mine and utilize the rich semantic information contained in the image to guide the decoder to generate high-quality change descriptions, we propose an efficient semantic attention network (ESAN). Specifically, we first perform global efficient semantic representation (GESR) on the obtained remote sensing feature map to promote the understanding of complex scenes in remote sensing images. Then we further propose a cross-semantic feature enhancement module (CSFE) to effectively distinguish semantic changes from irrelevant changes. Finally, we input the obtained image features into the adaptive multi-layer Transformer decoder to guide the generation of change description. Extensive experiments on two representative remote sensing datasets, Dubai-CC and LEVIR-CC, demonstrate the superiority of the proposed model over many advanced technologies.
Paper Type: Long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Research Area Keywords: image text matching, speech and vision, multimodality
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 2228
Loading