Towards Explainable Multimodal Land Cover Segmentation Using Swin Transformer

Islomjon Shukhratov; Mehak Khan; Reza Arghandeh

Towards Explainable Multimodal Land Cover Segmentation Using Swin Transformer

Islomjon Shukhratov, Mehak Khan, Reza Arghandeh

Published: 05 Nov 2025, Last Modified: 05 Nov 2025NLDL 2026 AbstractsEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision Transformers, Explainable AI, Segmentation, Remote Sensing

Abstract: Recent advancements in Vision Transformers (ViTs) demonstrate strong potential in remote sensing, providing powerful spatial feature representations for complex land cover segmentation tasks. In this study, we explore multimodal data fusion of Synthetic Aperture Radar (SAR) and optical imagery for land cover mapping. We train and evaluate Swin Transformer models and employ explainable AI (xAI) techniques to analyse the contribution of each modality and feature to the model’s predictions. We expect to improve the interpretability and robustness of multimodal remote sensing models for land cover segmentation.

Serve As Reviewer: ~Islomjon_Shukhratov1

Submission Number: 16

Loading