Towards Explainable Multimodal Land Cover Segmentation Using Swin Transformer

Published: 05 Nov 2025, Last Modified: 05 Nov 2025NLDL 2026 AbstractsEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision Transformers, Explainable AI, Segmentation, Remote Sensing
Abstract: Recent advancements in Vision Transformers (ViTs) demonstrate strong potential in remote sensing, providing powerful spatial feature representations for complex land cover segmentation tasks. In this study, we explore multimodal data fusion of Synthetic Aperture Radar (SAR) and optical imagery for land cover mapping. We train and evaluate Swin Transformer models and employ explainable AI (xAI) techniques to analyse the contribution of each modality and feature to the model’s predictions. We expect to improve the interpretability and robustness of multimodal remote sensing models for land cover segmentation.
Serve As Reviewer: ~Islomjon_Shukhratov1
Submission Number: 16
Loading