Abstract: Monitoring changes in the Earth’s surface is crucial for understanding natural processes and human impacts, necessitating precise and comprehensive interpretation methodologies. Remote sensing (RS) satellite imagery offers a unique perspective for monitoring these changes, leading to the emergence of RS image change interpretation (RSICI) as a significant research focus. Current RSICI technology encompasses change detection and change captioning, each with its limitations in providing comprehensive interpretation. To address this, we propose an interactive Change-Agent, which can follow user instructions to achieve comprehensive change interpretation and insightful analysis, such as change detection and change captioning, change object counting, and change cause analysis. The Change-Agent integrates a multilevel change interpretation (MCI) model as the eyes and a large language model (LLM) as the brain. The MCI model contains two branches of pixel-level change detection and semantic-level change captioning, in which the BI-temporal iterative interaction (BI3) layer is proposed to enhance the model’s discriminative feature representation capabilities. To support the training of the MCI model, we build the LEVIR-MCI dataset with a large number of change masks and captions of changes. Experiments demonstrate the state-of-the-art (SOTA) performance of the MCI model in achieving both change detection and change description simultaneously and highlight the promising application value of our Change-Agent in facilitating comprehensive interpretation of surface changes, which opens up a new avenue for intelligent RS applications. To facilitate future research, we will make our dataset and codebase publicly available at https://github.com/Chen-Yang-Liu/Change-Agent .
Loading