Learning Dual Enhanced Representation for Contrastive Multi-view Clustering

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Contrastive multi-view clustering is widely recognized for its effectiveness in mining feature representation across views via contrastive learning (CL), gaining significant attention in recent years. Most existing methods mainly focus on the feature-level or/and cluster-level CL, but there are still two shortcomings. Firstly, feature-level CL is limited by the influence of anomalies and large noise data, resulting in insufficient mining of discriminative feature representation. Secondly, cluster-level CL lacks the guidance of global information and is always restricted by the local diversity information. We in this paper Learn dUal enhanCed rEpresentation for Contrastive Multi-view Clustering (LUCE-CMC) to effectively addresses the above challenges, and it mainly contains two parts, i.e., enhanced feature-level CL (En-FeaCL) and enhanced cluster-level CL (En-CluCL). Specifically, we first adopt a shared encoder to learn shared feature representations between multiple views and then obtain cluster-relevant information that is beneficial to the clustering results. Moreover, we design a reconstitution approach to force the model to concentrate on learning features that are critical to reconstructing the input data, reducing the impact of noisy data and maximizing the sufficient discriminative information of different views in helping the En-FeaCL part. Finally, instead of contrasting the view-specific clustering result like most existing methods do, we in the En-CluCL part make the information at the cluster-level more richer by contrasting the cluster assignment from each view and the cluster assignment obtained from the shared fused features. The end-to-end training methods of the proposed model are mutually reinforcing and beneficial. Extensive experiments conducted on multi-view datasets show that the proposed LUCE-CMC outperforms established baselines to a considerable extent.
Primary Subject Area: [Content] Multimodal Fusion
Secondary Subject Area: [Content] Multimodal Fusion, [Experience] Multimedia Applications
Relevance To Conference: Existing contrastive multi-view clustering methods at both feature-level and cluster-level contrastive learning are constrained by abnormal data and significant noise, along with a lack of global guidance, which result in the inability to fully mine the discriminative features and limits the diversity of perspective information mining. We in this paper Learn dUal enhanCed rEpresentation for Contrastive Multi-view Clustering (LUCE-CMC), effectively addressing the challenges mentioned. LUCE-CMC makes two contributions: Enhanced Feature-Level Contrastive Learning (En-FeaCL) and Enhanced Cluster-Level Contrastive Learning (En-CluCL). Firstly, En-FeaCL merges feature alignment and reconstitution, driving the model to prioritize learning key features for input data reconstruction, and maximizing the upper limit of feature learning from each perspective. This approach effectively reduces the impact of noisy data and enhances feature learning potential from multiple perspectives. Secondly, En-CluCL utilizes the synergy of integrated fused high-level features to better identify similarities and differences in multiview data across views. With the advantages of both, LUCE-CMC method achieves improved and refined clustering performance. In a word, LUCE-CMC method offers an effective solution for handling complex relationships and information in multi-view data. Through dual enhancement (En-FeaCL and En-CluCL), it effectively mines more valuable intrinsic perspective features. Consequently, LUCE-CMC method holds significant potential for research and applications in fields like object recognition, image classification, and multimodal clustering.
Submission Number: 2275
Loading