Cross-Instance Contrastive Masking in Vision Transformers for Self-Supervised Hyperspectral Image Classification

Published: 06 Mar 2025, Last Modified: 01 May 2025SCSL @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: tiny / short paper (up to 3 pages)
Keywords: Cross-Instance Contrastive Masking, Vision Transformer, Hyperspectral Image, Self-supervision
TL;DR: The article introduces Cross-Instance Contrastive Masking, a method for hyperspectral image classification that improves feature extraction and reduces shortcut learning through dynamic contrastive masking.
Abstract: Spurious correlations arise when models learn non-causal features, such as background artifacts, instead of meaningful class-relevant patterns. This paper proposes a novel Cross-Instance Contrastive Masking in Vision Transformer (CICM-ViT) for hyperspectral image (HSI) classification, which attempts to reduce shortcut learning through Cross-Instance Contrastive Masking (CICM) by shuffling and masking patches, enforcing invariant and causal feature learning through spectral-spatial feature extraction via self-supervision. Using the dependencies between instances, CICM-ViT dynamically masks spectral patches across instances, promoting the learning of discriminative features while reducing redundancy, especially in low-data settings. This approach reduces shortcut learning by focusing on global patterns rather than relying on local spurious correlations. CICM-ViT achieves state-of-the-art performance on HSI datasets, with 99.91% OA on Salinas, 96.88% OA on Indian Pines, and 98.88% OA on Botswana, outperforming most SOTA CNN- and Transformer-based approaches in both accuracy and efficiency, with only 89,680 parameters. Further experiments on a semi-synthetic dataset demonstrate the effectiveness of the method against spurious correlations.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 61
Loading