SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation

SARChat-Bench-2M: A Multi-Task Vision-Language Benchmark for SAR Image Interpretation

ACL ARR 2025 February Submission3718 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: As a powerful all-weather Earth observation tool, synthetic aperture radar (SAR) remote sensing enables critical military reconnaissance, maritime surveillance, and infrastructure monitoring. Although Vision language models (VLMs) have made remarkable progress in natural language processing and image understanding, their applications remain limited in professional domains due to insufficient domain expertise. This paper innovatively proposes the first large-scale multimodal dialogue dataset for SAR images, named SARChat-2M, which contains approximately 2 million high-quality image-text pairs, encompasses diverse scenarios with detailed target annotations. This dataset not only supports several key tasks such as visual understanding and object detection tasks, but also serves as the first visual-language benchmark in the SAR domain. Through this work, we enable and evaluate VLMs' capabilities in SAR image interpretation, providing a paradigmatic framework for constructing multimodal datasets across various remote sensing vertical domains. Through experiments on 16 mainstream VLMs, the effectiveness of the dataset has been fully verified. The project will be released at https://anonymous.4open.science/r/SARChat-D0ED/.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking; evaluation; statistical testing for evaluation;

Contribution Types: Data resources

Languages Studied: English

Submission Number: 3718

Loading