Training-free enhancement of satellite remote sensing VLMs via Geo-Contrastive Decoding

Zixuan Shangguan, Jingrui Zhang, Ke Xing, Zhuohao Gong, Yong Zhang, Yimeng Xu, Feng Liang

Published: 2026, Last Modified: 30 May 2026J. Cloud Comput. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Vision language models (VLMs) have opened new avenues for satellite remote sensing image analysis and have shown promise across multiple tasks. However, in the absence of a remote sensing–oriented general VLM, existing approaches rely on retraining generic VLMs with remote sensing datasets to adapt to downstream tasks. This practice inevitably introduces statistical biases from two sources: (1) generic VLMs inherit biases from noisy web data used during pretraining, and (2) many remote sensing datasets rely on annotations generated by VLMs, which themselves introduce annotation bias. Such statistical biases exacerbate the alignment gap of remote sensing VLMs, leading to generation bias and degraded task performance. To address this issue, we propose Geo-Contrastive Decoding (Geo-CD) to enhance satellite remote sensing VLMs. Geo-CD operates at inference time and reduces over-reliance on statistical biases by contrasting two complementary perspectives: an expert view, derived from the original image, and an amateur view, derived from a noise-distorted image with only low-attention visual tokens retained. By aligning the model’s output distributions across these views, Geo-CD suppresses the generation bias and thereby enhances the performance. Extensive experiments across diverse remote sensing benchmarks demonstrate that Geo-CD consistently enhances task performance without additional training or external supervision.
Loading