Visual grounding of remote sensing images with multi-dimensional semantic-guidance

Published: 01 Jan 2025, Last Modified: 26 Jun 2025Pattern Recognit. Lett. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Introduce MSVG, a novel framework for visual grounding in remote sensing.•Propose an MTAM module for multi-stage visual–textual feature alignment.•Propose a VEFM module that refines correlation, ensuring precise localization.•We achieved new SOTA results on both RefCOCO and DIOR-RSVG datasets.
Loading