Abstract: Image Alignment between Synthetic Aperture Radar (SAR) and Electro-Optical (EO) imagery is a task that has comprehensive remote sensing capabilities. Traditional deep learning-based SAR-optical image matching models heavily rely on supervised learning with expensive annotated datasets, leading to reduced accuracy and overfitting when encountering insufficient data during training. To tackle this issue, this paper proposes a student-teacher framework for Domain Adaptation (DA) approach, transferring deep learning models from well-annotated source domains like normal outside optical images to non-annotated SAR-EO target domains. Additionally, in contrast to previous methods which usually use CNN or ordinary Transformer structure to extract features from image pairs, we use self and cross attention mechanisms in Transformer to obtain feature descriptors that are conditioned on both multimodal images. The larger global receptive field and better feature extraction capability provided by this Transformer shows its ability to accommodate large disparities of multimodal data and manage fewer textures such as rural areas with forest or desert in satellite images, where previous backbones usually struggle to produce repeatable and correct interest points.
Loading