Abstract: Existing multi-view clustering methods have
achieved remarkable success for general images, but still have
many limitations for clustering multimodal remote sensing
images (RSIs). For example, these methods are sensitive to noise
and spectral variability, ignore the diverse spatial structure
information across modalities, or are computationally prohibitive
for large-scale RSIs, thereby limiting their applications. This
paper proposes a multi-scale spectral-spatial anchor graph
fusion (MSSAGF) method for multimodal remote sensing image
clustering. MSSAGF develops a superpixel-based nonlinear
neighborhood recovery strategy to reduce noise while enhancing
the spatial smoothness within multimodal remote sensing images.
Using spatial-aware anchors to extract local spatial information
for each modality, MSSAGF introduces multiscale local spectralspatial
anchor graphs to capture nonlinear correlations between
the pixels and their corresponding local regions. A small
number of anchors effectively reduces graph construction and
partitioning costs, making the time complexity of MSSAGF
nearly linear. This ensures it is computationally feasible for
large-scale RSIs. Finally, MSSAGF develops an adaptive fusion
mechanism to fuse multiscale local anchor graphs into a unified
global anchor graph, integrating complementary information
across multiple modalities while directly obtaining the final
clustering results. The experimental results on three multimodal
RSIs datasets demonstrate the superiority of our proposed
method over state-of-the-art methods. Our code is publicly
available at https://github.com/W-Xinxin/MSSAGF.
Loading