Visual grounding of remote sensing images with multi-dimensional semantic-guidance

Yueli Ding, Di Wang, Ke Li, Xiaohong Zhao, Yifeng Wang

Published: 2025, Last Modified: 26 Jun 2025Pattern Recognit. Lett. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•Introduce MSVG, a novel framework for visual grounding in remote sensing.•Propose an MTAM module for multi-stage visual–textual feature alignment.•Propose a VEFM module that refines correlation, ensuring precise localization.•We achieved new SOTA results on both RefCOCO and DIOR-RSVG datasets.