Abstract: Advancements in Earth observation (EO) have led to an increase in the volume of and easier access to multimodal geospatial data, making environmental monitoring and analysis more accessible. However, understanding the influence of each input modality on decision-making within deep learning models remains an open challenge. This letter proposes a deep occlusion framework to enhance the interpretability of a multimodal model for land naturalness assessment, using a supervised pixelwise regression task for naturalness mapping with the input modalities Sentinel-2 and Sentinel-1 imagery, land cover maps, and nighttime lights intensity data. The proposed framework systematically occludes individual input modalities to create modality-level influence scores. Influence scores are attributed to input modalities by measuring the distance between the embedding of the nonoccluded input and the embedding of the input with a single modality occluded, revealing how each modality influences predictions and clarifying their contributions (and, thus, importance) in the model’s decision-making process. The results provide further insights into how input modalities influence the model’s decision-making at both the sample level, enabling regional case studies, and the dataset level, allowing for data pruning and improving training and inference times. The code is available at https://github.com/burakekim/embedding_occlusion .
Loading