Abstract: Cross-view consensus representation plays a critical role in hyperspectral images (HSIs) clustering. Recent multi-view contrastive cluster methods utilize contrastive loss to extract contextual consensus representation. However, these methods have a fatal flaw: contrastive learning may treat similar heterogeneous views as positive sample pairs and dissimilar homogeneous views as negative sample pairs. At the same time, the data representation via self-supervised contrastive loss is not specifically designed for clustering. Thus, to tackle this challenge, we propose a novel multi-view clustering method, i.e., Enhanced Multi-View Contrastive Clustering (EMVCC). First, the spatial multi-view is designed to learn the diverse features for contrastive clustering, and the globally relevant information of spectrum-view is extracted by Transformer, enhancing the spatial multi-view differences between neighboring samples. Then, a joint self-supervised loss is designed to constrain the consensus representation from different perspectives to efficiently avoid false negative pairs. Specifically, to preserve the diversity of multi-view information, the features are enhanced by using probabilistic contrastive loss, and the data is projected into a semantic representation space, ensuring that the similar samples in this space are closer in distance. Finally, we design a novel clustering loss that aligns the view feature representation with high confidence pseudo-labels for promoting the network to learn cluster-friendly features. In the training process, the joint self-supervised loss is used to optimize the cross-view features. Abundant experiment studies on numerous benchmarks verify the superiority of EMVCC in comparison to some state-of-the-art clustering methods. The codes are available at https://github.com/YiLiu1999/EMVCC.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Content] Vision and Language
Relevance To Conference: Enhanced multi-view contrastive clustering technique integrates multiple perspectives of hyperspectral images to learn more discriminative and generalizable feature representations, aiding multimodal fusion, dimensionality reduction, unsupervised learning, and improving scalability and generalization across various datasets, which is crucial in multimedia processing.
Submission Number: 4942
Loading